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All sound recordings used to produce spectrograms and oscillograms for this study have been deposited in the Fonoteca 
Neotropical Jacques Vielliard under accession numbers FNJV 32326-32362. Sound recordings of Leptodactylus used for 
analyses of nightly call variation are deposited in the animal sound archive (aninialsoundarchive.org) under accession numbers 
77936-77941. 


Abstract 

Vocalizations of anuran amphibians have received much attention in studies of behavioral ecology and physiology, but 
also provide informative characters for identifying and delimiting species. We here review the terminology and variation 
of frog calls from a perspective of integrative taxonomy, and provide hands-on protocols for recording, analyzing, com¬ 
paring, interpreting and describing these sounds. Our focus is on advertisement calls, which serve as premating isolation 
mechanisms and, therefore, convey important taxonomic information. We provide recommendations for terminology of 
frog vocalizations, with call, note and pulse being the fundamental subunits to be used in descriptions and comparisons. 
However, due to the complexity and diversity of these signals, an unequivocal application of the terms call and note can 
be challenging. We therefore provide two coherent concepts that either follow a note-centered approach (defining unin¬ 
terrupted units of sound as notes, and their entirety as call) or a call-centered approach (defining uninterrupted units as call 
whenever they are separated by long silent intervals) in terminology. Based on surveys of literature, we show that numer¬ 
ous call traits can be highly variable within and between individuals of one species. Despite idiosyncrasies of species and 
higher taxa, the duration of calls or notes, pulse rate within notes, and number of pulses per note appear to be more static 
within individuals and somewhat less affected by temperature. Therefore, these variables might often be preferable as tax¬ 
onomic characters over call rate or note rate, which are heavily influenced by various factors. Dominant frequency is also 
comparatively static and only weakly affected by temperature, but depends strongly on body size. As with other taxonomic 
characters, strong call divergence is typically indicative of species-level differences, whereas call similarities of two pop¬ 
ulations are no evidence for them being conspecific. Taxonomic conclusions can especially be drawn when the general 
advertisement call structure of two candidate species is radically different and qualitative call differences are thus ob¬ 
served. On the other hand, quantitative differences in call traits might substantially vary within and among conspecific 
populations, and require careful evaluation and analysis. We provide guidelines for the taxonomic interpretation of adver¬ 
tisement call differences in sympatric and allopatric situations, and emphasize the need for an integrative use of multiple 
datasets (bio-acoustics, morphology, genetics), particularly for allopatric scenarios. We show that small-sized frogs often 
emit calls with frequency components in the ultrasound spectrum, although it is unlikely that these high frequencies are 
of biological relevance for the majority of them, and we illustrate that detection of upper harmonics depends also on re¬ 
cording distance because higher frequencies are attenuated more strongly. Bioacoustics remains a prime approach in in¬ 
tegrative taxonomy of anurans if uncertainty due to possible intraspecific variation and technical artifacts is adequately 
considered and acknowledged. 

Key words: Amphibia, Anura, sound, vocalization, call, note, pulse, definitions, call variation, call analysis, call descrip¬ 
tion, taxonomy, species delimitation 


Introduction 

Taxonomy, the inventory and classification of organisms, is increasingly becoming an integrative discipline 
(Dayrat 2005; Padial et al. 2009). Under the general lineage or evolutionary concept of species (Simpson 1961; 
Wiley 1978; Mayden 1997; De Queiroz 1998, 2007) a variety of characters and lines of evidence can serve to 
delimit species. In this varied toolbox, evidence related to reproductive isolation is particularly powerful as it 
satisfies the biological species criterion, an unambiguous and undisputed means for species delimitation (Mayr 
1969; Padial et al. 2010). Evidence for reproductive isolation can be found through a variety of approaches, 
including postzygotic hybrid inviability, divergent morphological structure of genitals, or differences in behavioral 
characters mediating mate recognition. In amphibians, mechanisms of mate recognition and mate choice involve 
pheromones in salamanders and frogs (Malacarne & Giacoma 1986; Pearl et al. 2000; Toyoda et al. 2004; 
Kikuyama et al. 2005; Byrne & Keogh 2007; Belanger & Corkum 2009; Poth et al. 2012; Starnberger et al. 2013; 
Treer et al. 2013), visual signaling such as foot-waving (Hodl & Amezquita 2001; Toledo et al. 2007; Boeckle et 
al. 2009), elaborated nuptial displays in newts (Halliday 1977) or inflation of, sometimes colorful, vocal sacs (e.g., 
Rosenthal et al. 2004; Hirschmann & Hodl 2006), water surface waves (Walkowiak & Milnz 1985), surface 
vibrations (Narins 1990; Cardoso & Heyer 1995; Lewis et al. 2001; Caldwell et al. 2010), acoustic signals (most 
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recent summary in Wells 2007), and in some species a multimodal combination of several of these cues (Narins et 
al. 2003, 2005; Taylor et al. 2007; Grafe et al. 2012; Starnberger et al. 2014a; De Sa et al. 2016). 

The taxonomic utility of acoustic signals is well known for numerous organisms (e.g., Littlejohn 1969; Payne 
1986; Alstrom & Ranft 2003; Jones & Barlow 2003; Bickford et al. 2006; Tishechkin 2014). Call divergence has 
been observed to be involved in sympatric speciation processes in some birds and bats (Sorenson et al. 2003; 
Kingston & Rossiter 2004; but see Slabbekoorn & Smith 2002). It is however known that bird songs can be molded 
by learning rather than reflecting genetic determination (Raposo & Hdfling 2003). Yet, call recordings have been 
crucial in discovering and delimiting new avian species over the past decades (Alstrom & Ranft 2003). As far as is 
thought, the calls of anuran amphibians are heritable although, interestingly, an experimental study on 
Engystomops pustulosus indicated that early individual acoustic experience may lead to changes of the 
advertisement call (Dawson & Ryan 2009). In insects, bioacoustical taxonomy is typically limited to those taxa 
with conspicuous sounds, but the degree of signal variability differs among orders, families and even congeneric 
species (Tishechkin 2014). Sounds emitted by insects are highly stereotyped and genetically determined, as has 
been studied in the courtship sounds of Drosophila (Kyriacou & Hall 1986; von Philipsborn et al. 2011). Numerous 
cryptic insect species have been discovered based on their sounds (Obrist et al. 2010). 

The field of anuran bioacoustics has seen high research intensity with insights from multiple angles, mostly 
from those of behavioral sciences and behavioral ecology (Bogert 1960; Blair 1963, 1968; Schneider 1966; 
Lescure 1968; Paillette 1971; Salthe & Mecham 1974; Keister 1977; Wells 1977, 1988; Gerhardt 1988; Rand 1988; 
Gerhardt & Schwartz 1995; Ryan 2001; Gerhardt & Huber 2002; Wells & Schwartz 2007), ecology (Schiotz 1973), 
evolution (Straughan 1973; Ryan 1988; Gerhardt 1994a; Cocroft & Ryan 1995; Goicoechea et al. 2010), 
physiology (Narins & Zelick 1988; Kelley 2004), with the most comprehensive coverage probably provided by 
Gerhardt & Huber (2002) and Wells (2007). In a number of frogs, speciation by reinforcement of advertisement 
call differentiation has been convincingly demonstrated (especially in Litoria treefrogs: Hoskin et al. 2005; see also 
Littlejohn & Loftus-Hills 1968). Surprisingly, although the function of frog calls as a premating isolation 
mechanism implies a high importance for systematics (Duellman 1963; Blair 1964; Littlejohn 1969), this aspect 
has received less attention in contemporary comprehensive treatments. Nevertheless, the application of 
comparative bioacoustical analyses has globally resulted in the discovery of many morphologically cryptic anuran 
species during the last third of the 20“’ century and consequently in a boost of species numbers (Glaw & Kohler 
1998; Kohler et al. 2005a; Vences & Kohler 2008). 

In this review, we focus on the utility of anuran vocalizations for taxonomy, with the main goal of providing 
clear guidelines for recording, analyzing and interpreting frog calls in the taxonomic context, specifically for 
species delimitation and species identification. Such a perspective is rare in reviews published to date (but see 
Schneider & Sinsch 2007), and comprehensive hands-on recommendations for carrying out such work are largely 
missing. We first provide brief summaries of current knowledge about morphological, physiological, behavioral, 
ecological and evolutionary aspects of frog bioacoustics, but for deeper insights into these fields we recommend 
the respective original publications or reviews, or the comprehensive accounts in Wells (2007). Our goal is instead 
to provide detailed knowledge on the use of bioacoustics in frog taxonomy. 


Terminology of sounds and sound production 
Sound production in an mans 

Animals produce a variety of sounds of which only those conferring some kind of signal to either conspecifics or to 
potential predators (Collias 1960) are relevant for this review. A signal is defined as the use of specialized, species- 
typical morphology or behavior to influence the current or future behavior of another individual (Owren et al. 
2010; Bradbury & Vehrencamp 2011). 

In general, any kind of acoustic wave can be subsumed under the term sound, whereas other terms are more 
specific (see terminology in Table 1). Acoustical signals are such sounds that mediate intraspecific or interspecific 
communication. Vocalizations are those sounds produced by means of the respiratory system of a vertebrate 
animal, typically by the action of vocal cords, while the terms call and song are used in different ways in different 
animal taxa (see below). These are typically described using a limited number of categories (call, note, pulse, and 
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derivatives such as call series, note series, pulse group, etc.)- These terms will he defined in more detail in 
subsequent sections. In brief, any vocalization emitted by a frog is considered as a call, independent of its function 
and structure, and might be further categorized if appropriate. Some species also produce sounds and/or surface 
vibrations by shaking (tremulation) display {e.g., red-eyed treefrogs, Agalychnis callidryas: Caldwell et al. 2010), 
by tapping the gular pouch against the ground (Leptodactylus albilabris: Lewis & Narins 1985; Lewis et al. 2001; 
Hydrolaetare dantasr. Souza & Haddad 2003), or by drumming with their forelimbs on the substrate 
{Leptodactylus syphax: Cardoso & Heyer 1995). We here largely exclude such sound production mechanisms from 
our further discussion and focus on anuran vocalizations. 

TABLE 1. Glossary of terms used to describe animal vocalizations, with a focus on anurans. General definitions are 
standard physical terminology and have been adapted only in some cases (e.g., Vocalization). Specific terms for anuran 
bioacoustics largely agree with standard use (e.g., Duellman & Trueb 1994; Wells 2007) but definitions have been 
refined herein (detailed rationale see main text). 


Term 

Definition / Comments 


General definitions 


Acoustical signal 

A sound emitted with the function of eliciting a behavioral response from another, 
conspecific or heterospecific, animal. 

Amplitude of sound 

Difference befween peak pressure (corresponds fo peak of sound wave) to ambient 
pressure. Proportional to sound intensity. Amplitude can be compared among 
recordings obtained under standardized conditions (same recording equipment with 
same level settings, same angle and same distance to sound emitter). 

Amplitude modulation 

Change in the amplitude level of a sound wave over time. A 100% amplitude 
modulation means a change from maximum relative amplitude to full silence. 

Audible sound 

Defined as the sound perceivable by humans: 20-20,000 Hz. 

Bioacoustic / bioacoustical 

Adjective referring to sounds produced by animals. Analogous to the recommended 
usage of acoustic vs. acoustical (Hunt 1955) we suggest using "bioacoustic" when the 
term being qualified designates something with the properties or characteristics of 
sound waves, such as energy, wave or signal; and the use of "bioacoustical" when 
referring to something without such characteristics, such as measurement, trait, 
analysis, or method. 

Envelope 

The shape of the waveform of a pulse, note or call; generally symmetrical about the 
zero axis. 

FFT 

Fast Fourier Transformation. Decomposition of a complex waveform into sine waves 
for analysis. An algorithm used to produce spectrograms and power spectra. 

FFT window size (FFT resolution) 

Segment length in number of samples per segment used for FFT analysis. Longer 
segments allow for higher resolution of frequency but lower time resolution, and vice 
versa. 

Frequency modulation 

Change in the instantaneous frequency of a signal over time 

Harmonic 

Many sounds have their energy concentrated in several separated, evenly spaced 
frequencies called harmonics. These frequencies are multiples of the lowest {i.e., first 
or fundamental) harmonic, and result from periodic patterns of oscillation, caused by 
back-bouncing after completion of a wave of the longest wavelength (dominant 
frequency). Visually, harmonic-like patterns in a spectrogram can however also be 
caused by more complex acoustic phenomena {e.g., sidebands; see below) and 
artifacts, and will for instance emerge with high FFT bandwidth values. 

Infrasound 

Sounds < 20 Hz, used for long-distance communication in some animals such as 
elephants. 

Nyquist frequency 

Highest frequency that can be digitized without introducing artifacts. Corresponds to 
half the sampling rate of the digitizing (recording) device. 

Oscillogram 

A visual representation of a sound, displaying the changes in amplitude over time. 

Power spectrum 

A visual representation of a sound, showing the relative amplitude of each frequency 
component. 


....continued on the next page 


6 ■ Zootaxa 425\ {\) © 2017 Magnolia Press 


KOHLER ETAL 




















TABLE 1. (Continued) 


Term 

Definition / Comments 

Pulse 

Physically, a single unbroken wave train isolated in time by significant amplitude 
reduction. See adapted definition of pulse in the terms used for anuran call descriptions 
below. 

Sampling rate (R) 

Number of amplitude measurements taken per second when digitizing a sound wave 
{e.g., 44.1 kHz sampling rate results in 44,100 samples of amplitude measurement for 
every second). 

Sideband 

Frequency bands in sounds with amplitude or frequency modulation. Sidebands occur 
as additional frequency bands above and below the modulated carrier frequency. They 
might be produced either naturally, or caused by the electronics of the recording 
device, or by interaction with unrelated sounds. 

Sound 

Longitudinal pressure waves travelling through a medium such as air, water or 
substrate. 

Sound intensity 

Product of sound pressure and particle velocity. Proportional to the square of a wave's 
amplitude. 

Sound pressure 

Deviation of local pressure from ambient atmospheric pressure produced by sound 
waves. Note that absolute sound pressure and sound intensity cannot be measured from 
a normal (uncalibrated) sound recording but require application of particular devices 
and protocols in the field or laboratory. 

Sound frequency 

Number of oscillation cycles of sound waves per time unit; cycles per second are 
measured in Hertz (Hz) or kiloHertz (kHz = 1,000 Hz) 

Spectrogram 

A visual representation of a sound, displaying the frequency and amplitude of the 
sound over time. Equivalent to audiospectrogram, sonagram or sonogram (‘sonagram’ 
is a registered trademark of Kay Sonagraphs and its use is not recommended). 

Tonal 

A sound consisting of a single frequency component at any time instant (although this 
can vary in time, hence being modulated). 

Ultrasound 

Sound frequencies > 20,000 Hz, frequently registered in insects, echolocating bats and 
whales, but also in some frogs. 

Vocalization 

Any kind of sound produced by animals by means of their respiratory system, typically 
by the action of vocal cords, independent of its categorization (song, call) or structure 
(tonal, pulsatile, pulsed, etc.). Some animal sounds are produced using different 
mechanisms and therefore do not qualify as vocalizations under this definition. 

Wavelength 

Period of a wave - distance at which the shape of a wave is repeated. Wavelength is 
inversely proportional to frequency. 

Windowing 

Segments in FFT analysis with sharp start or stop would result in broad frequency 
bands. To avoid such artifacts, different windowing functions taper the onset and offset 
of each segment gradually. Usage of any of these (Hanning, Hamming, Blackwell etc.) 
will emphasize different regions of the time slice. Many anuran call analyses use 
Hanning windowing with a bandwidth of 256 or 512 for spectrograms. 


Terms and deflnitions suggested for describing anuran vocalizations (see also general definitions in first part of this 
table for additional terms) 


Amplitude modulation 

Changes in the amplitude or envelope of a signal over time; if the sound is completely 
interrupted it is 100% modulated. 

Call 

An acoustic unit of frog vocalization, a distinct sound; a call is separated from other 
calls by periods of silence (typically much longer than the call); duration of calls of one 
type is usually consistent and regular; a call may be emitted solely (/. e. not as 
mandatory part of a series); next higher level of acoustic element is the call group or 
call series. Might be composed of one or several notes of the same type (simple call) or 
of different types of notes (complex call). 

Call duration 

The duration of a single call, no matter if composed of single or multiple notes; 
measured from beginning to the end of the call. 


....continued on the next page 
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TABLE 1. (Continued) 


Term 

Definition / Comments 

Call duty cycle or calling effort 

The fraction of the signaling period where a call is produced. Can be calculated as the 
ratio of the call duration to the call period, or call rate multiplied by call duration [in 
seconds, hours or dimensionless (percentage as ratio of sound to silence)]. 

Call group 

Calls may be organized into groups which are separated from other such groups by 
periods of silence much longer than the inter-call intervals; inter-call intervals within 
groups are stable or changing in a predictable pattern. 

Call period 

Call duration plus inter-call interval, or, time between the beginning of one call to the 
beginning of the consecutive call. 

Call (repetition) rate 

Instantaneous call rate. Number of calls emitted in a defined period of time. It can be 
either calculated as reciprocal of the call period or as the ratio of the absolute number 
of calls and the absolute duration in which these calls were emitted—the latter may not 
be ‘instantaneous’. Thus, the way of calculation must be precisely stated. The value 
should be provided as calls per minute. 

Call series 

A call group, within which calls are repeated at regular intervals. 

Call type 

A category of vocalizations emitted in a particular social context and particular 
function (reproductive, aggressive, defensive). See main text and Table 2 for a list of 
call types distinguished in anurans. The advertisement call repertoire of some species 
may consist of more than one type of advertisement call. 

Dominant frequency 

The peak frequency of the call (or note); the frequency containing the highest sound 
energy (in Hz or kHz). 

Bandwidth 

Total range of frequencies present in the emitted sound. Total bandwidth is typically 
difficult to measure in natural recordings. Measurements can more easily be carried out 
at a given threshold level which must be specified and kept constant in all 
measurements for comparison purposes; we recommend measuring 90% bandwidth (- 
10 dB threshold; containing 90% of the sound energy). In recordings with strong 
background noise, only approximate prevalent bandwidth (range of frequencies with at 
least some sound energy assignable to the call) can be estimated. 

Frequency modulation 

Changes in frequency over time. May be ascending, descending, ‘v’ shaped, or even 
sinusoidal. 

Inter-call interval 

The interval between two consecutive calls, measured from the end of the call to the 
beginning of the consecutive call. 

Inter-note interval 

The interval between two consecutive notes within the same call, measured from the 
end of one note to the beginning of the consecutive note. 

Inter-pulse interval 

The interval between two consecutive pulses, its duration measured from the end of 
one pulse to the beginning of the consecutive pulse. Should only be defined and 
measured in such cases where fully silent intervals between pulses occur {i.e., 100% 
amplitude modulation). 

Note 

Main subunit of a call. Calls are often broken into smaller subunits (= notes) by 100% 
amplitude modulation with only short intervals between them relative to length of call. 
Calls can also consist of only a single note. Notes might be further subdivided into 
pulses. 

Note type 

A call might consist of very similar notes arranged in a stereotyped or more complex 
manner (one note type = simple calls). But the notes might also differ from each other 
in temporal, spectral and/or energetic properties and different types of notes can be 
defined (more than one note type = complex calls). 

Note group 

Notes may be organized into groups which are separated from other such groups by 
periods of silence that are longer than the intervals between notes in a group; spacing 
of notes within groups is regular or changing in a predictable pattern. 

Note series 

A note group with notes repeated at regular intervals. 

Note duration 

The duration of a single note within a call; measured from beginning to the end of the 
note. 


....continued on the next page 
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TABLE 1. (Continued) 

Term 

Definition / Comments 

Note (repetition) rate 

Number of notes repeated within a defined time period within a call or note series. The 
value should be provided as notes per second. 

Pulsatile 

A sound consisting of poorly defined energy bursts, namely fast alternating amplitude 
modulation without intermittent silence and no clearly countable peaks (if clearly 
countable, amplitude peaks refer to pulses). 

Pulse 

A single burst of sound energy, not further subdivided into subunits, separated by 
strong amplitude modulation from other pulses. Amplitude modulation is often less 
than 100%; hence pulses in anuran calls often are not separated by a fully silent 
interval due to intrinsic properties of the call, although background noise can lead to a 
similar situation. See also physical definition of pulse in first part of this table. 

Pulsed 

A sound consisting of a series of well-defined energy bursts (pulses). 

Pulse duration 

The duration of a pulse, measured from one amplitude minimum to the next amplitude 
minimum. As background noise can mask these minima, an amplitude threshold (i.e., 
% from the maximum) can be applied in order to make measurements comparable. 

Pulse group 

Pulses might be arranged into distinct groups, separated from other such groups by an 
unpulsed part of a note, or differing in intensity or spectral frequency. 

Pulse period 

Pulse duration plus inter-pulse interval. 

Pulse (repetition) rate 

Instantaneous pulse rate. Number of pulses repeated in a defined period of time within 
a note. The value should be provided as pulses per second. It can be either calculated as 
reciprocal of the pulse period (time from the beginning of one note to the beginning of 
next consecutive note) or as the ratio of the absolute number of pulses and the absolute 
duration in which these pulses were emitted—the latter may not be ‘instantaneous’. 
Note that the pulse rate may vary during a note or call and such variation is best 
averaged using the pulse/duration ratio. Thus, the way of calculation must be precisely 
stated. 

Pulse series 

Not a separate category but rather a descriptive term to refer to a call or note made up 
by a train of pulses, especially if these are of regular spacing, intensity and frequency. 

Pulse train 

Synonym of pulse series. 


Where detailed analyses have been carried out, sound production in frogs has mainly been found to occur 
during expiration (z'.e., by air passing from the lungs into the vocal sac, through the larynx: Martin 1971; Martin & 
Cans 1971; Duellman & Trueb 1994). Well-studied exceptions are fire-bellied toads of the genus Bombina 
(Bombinatoridae), which produce calls during inspiration (Fig. 1) (Zweifel 1959; Lorcher 1969), and Discoglossus 
(Alytidae), which produce intermittently inspiratory and expiratory notes (Weber & Schneider 1971; Weber 1974; 
Glaw & Vences 1991). 

In frogs with expiratory calls, positive pressure caused by contraction of muscles in the buccal cavity pumps 
air into the lungs. Then, in the second phase of the respiratory cycle, contraction of trunk muscles leads air to move 
back from the lungs into the buccal cavity, passing through the larynx where it causes the vocal cords to vibrate and 
produce sounds, further modified by muscles of the larynx (De Jongh & Cans 1969; Gans 1973; Gridi-Papp 2008; 
Ryan & Guerra 2014) and other related structures (Gridi-Papp et al. 2006; Kime et al. 2013). 

As reported by Wells (2007) and obvious from numerous species of bufonids or microhylids with very long 
calls, it is clear that the inhalation-exhalation system during call emission can be more complex. In such species, 
contractions of trunk muscles might be pulsatile (Martin 1972), and such series of short contractions would lead to 
alternating periods of inspiration and expiration (Gans 1973). For most of these species it has not yet been assessed, 
if sound production might also occur during short inhalatory phases, but evidence has been recently provided for 
such inspiratory sound production in the microhylid frog Dermatonotus muelleri (Giaretta et al. 2015). 

In males of many species of frogs, vocal sacs connect to the buccal cavity, typically via slit-like openings, and 
are inflated during vocalization. Their morphology varies from a median single subgular sac, to bilobate and paired 
subgular sacs or to paired lateral sacs (Fig. 2). Anuran vocal sacs might be slightly or highly distensible and were 
early defined as either internal or external vocal sacs (Liu 1935). Their form and color might also have a role in 
visual signaling (Starnberger et al. 2014b). Vocal sacs radiate the sound energy to the environment (Martin 1972; 
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FIGURE 1. Inspiratory calling of Bombina bombina (above) and expiratory calling in Pelophylax kl. esculentus (below). 
Oscillograms show one call and small photos show state of vocal sac at the respective time indicated. Video and sound recorded 
with a Nikon D750 and processed in Windows Movie Maker software; oscillograms drawn in CoolEdit Pro 2.0 software. 
Recording of Bombina made at Schorfheide-Chorin Reserve, Germany on 24 May 2015; recording of Pelophylax made at 
Riddagshausen Reserve, Braunschweig, Germany, on 2 June 2015. 
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FIGURE 2. Vocal sac variation in anurans. Except for Ceratophrys cramvelli (emitting a warning call with open mouth) and 
the pictured specimens of Hyperolius (emitting advertisement calls with aggressive components in male-male combat) all 
pictured specimens are emitting advertisement calls. Phrynobalrachus alleni is suspected to use its yellow vocal sac for visual 
signalling, and a visual function is also probable for the bright white vocal sacs of the two Guibemantis species, and of other 
frogs. All hyperoliids (such as Hyperolius viridiflavus shown here) have gular glands on the vocal sac that might have a visual 
function, in addition to probably producing pheromones (Starnberger et al. 2013). Note that the distinction between vocal sac 
types is not always clear; for instance, the vocal sacs of Rana temporaria and Boophis tsilomaro can be considered as partially 
paired subgular and partially paired lateral. All photos by the authors except Trachycephalus typhonius and Pseudopaludicola 
jaredi (by Daniel Loebmann). 
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Wells 2007), and their shape and size influence some properties of the sound signal, such as frequency modulation 
(Dudley & Rand 1991) although, as far as known, they do not act as cavity resonators altering the frequency 
composition (Rand & Dudley 1993). Surprisingly, how the structure and size of vocal sacs influence sound and 
sound transmission is poorly known (Wells 2007). 

In many species of frogs, vocal sacs are small and inconspicuous, and sometimes absent. This involves many 
diurnal-calling species, usually terrestrial or semiaquatic, in which an inflated vocal sac might be disadvantageous 
by drawing the attention of visually oriented predators. Vocal sacs are often lacking in frogs living in noisy 
environments such as fast-flowing torrents where long-range communication is difficult (summary in Wells 2007), 
or also in mute species (Jim & Caramaschi 1979). However, numerous exceptions to such trends are known (Wells 
2007) and thorough analyses of trait evolution are needed to understand ecological correlates of vocal sac variation. 
Certain taxa, such as some Asian Limnonectes (Dicroglossidae), some bufonids, and some hyperoliids are 
considered voiceless (Rodel et al. 2003), although in some cases, calls from putatively voiceless species have been 
reported (Matsui 1995; Orlov 1997; Tsuji & Lue 1998; Vences et al. 2004). The identification of Barbourula 
kalimantanesis (Bombinatoridae) as the first known lungless anuran species (Bickford et al. 2008) might indicate 
its voicelessness as well. 

In some frogs with relatively inconspicuous vocal sacs, males have a larger tympanum than females, often a 
much larger one. As studied in North American bullfrogs (Lithobates catesbeianus) and African petropedetids 
(Purgue 1997; Narins et al. 2001) these tympani serve not only for hearing but also radiate a substantial portion of 
call energy to the environment. 

The respiratory ventilation system, with air cycling from the lungs into the buccal cavity/vocal sac and back, 
typically takes place with a closed mouth, without releasing air to the outside (Gridi-Papp 2008; Fig. 1). In the 
special case of defensive, distress, alarm or warning calls, those are instead emitted with an open mouth (Fig. 2: 
Ceratophrys cmnwelli) (Toledo et al. 2015a). Similarly, African frogs of the genus Conraua (including the largest 
anuran on Earth, C. goliath) and Southeast Asian frogs of the genus Staurois emit advertisement calls with open 
mouths (Amiet 1989; Rodel & Branch 2002; Rodel & Bangoura 2004; Boeckle et al. 2009; Preininger et al. 2016). 

A totally different mechanism of sound production is known in the aquatic Pipidae (Yager 1996). Frogs in the 
genera Hyinenochirus, Pipa and Xenopus call motionless, without obvious movement of an air column and lack 
vocal cords in their box-like enlarged larynx (Rabb 1960; Rabb & Rabb 1963; Yager 1996), whereas 
Pseudhymenochirus has reverted to an air-stream driven sound production mechanism (Irisarri et al. 2011). In 
Xenopus, vocalizations are based on implosion of air into a vacuum caused by rapidly moving structures in the 
larynx (Yager 1992, 1996), and a similar mechanism can be assumed for Pipa and Hymenochirus. Although pipid 
acoustic signals have been termed courtship songs by some authors {e.g., Leininger & Kelley 2015), we suggest 
describing them with the same terminology as vocalizations of other anurans. 

Underwater calling in non-pipid adult frogs apparently occurs more frequently than usually observed and has 
been reported for Telmatobiidae (Cei & Roig 1965), Leptodactylidae (Dudley & Rand 1992), Ranidae (Boatright- 
Horowitz et al. 1999), Pelobatidae (Frommolt et al. 2008) and Megophryidae (Zheng et al. 2011). 


Functional categories of anuran vocalizations 

Anurans emit a variety of calls in different contexts. These have been subdivided into different categories (types) of 
calls by Bogert (1960) and with minor modifications {e.g., Littlejohn 1977; Wells 1977, 1988, 2007) this 
classification still applies today. We here mainly follow the functional categorization proposed by Toledo et al. 
(2015a), according to which anuran vocalizations are subdivided in three overarching categories: reproductive, 
aggressive and defensive calls (see Table 2), each with various subcategories. 

Doubtless, reproductive calls are the anuran vocalizations most commonly heard and of highest value in 
taxonomy. This in particular applies to the sound signal most frequently emitted by males (in some species also by 
females; Emerson & Boyd 1999; Boistel & Sueur 1997; review in Preininger et al. 2016) during the breeding 
season, the advertisement call (sensu Wells 1977). This call type was named mating call by Bogert (1960) and 
referred to under yet different names by different authors {e.g., breeding call, sex call, sex trills, courtship call, 
initial call, warm up call, sporadic call or chuckle call; Larson 2004; Toledo et al. 2015a). Advertisement calls are 
those conspicuous calls typically heard in the wild and they apparently serve two main functions: attracting 
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potential mates and conveying territorial information to conspecifics. In numerous examples, playback 
experiments have shown female frogs approaching conspecific advertisement calls (Wells 2007). For taxonomic 
purposes, analyses almost exclusively focus on advertisement calls, because (1) they are the most frequent calls in 
most species and are easy to record, and (2) they are emitted in the context of mating and thus can be expected to 
convey species-specific information involved in pre-zygotic isolation. 

A further subcategory of reproductive calls consists of the male and female release calls (termed contact calls 
in Penna & Veloso 1987; other synonyms in Toledo et al. 2015a), emitted by non-receptive individuals in response 
to an amplexus attempt. Species with derived mating behavior without amplexus often lack release calls {e.g., 
Malagasy frogs in the subfamily Mantellinae; Vences et al. 2007; but see Willaert et al. 2016 for release calls in 
Nyctibatrachidae), but this subcategory is in general observed across the majority of anuran families. Release calls 
can differ among closely related species {e.g., Castellano et al. 2002a) and have been proposed as a possible 
taxonomic character (Grenat & Martino 2013). A special case is the post-oviposition male release call, emitted by 
the male during amplexus, after oviposition and prior to the release of the female. Little research has been devoted 
to possible interspecific signalling through release calls. Although infrequent, female release calls might in some 
cases be directed to males of other species, with a putative function of avoiding heterospecific mating. In such 
situations, selective pressures could act to stabilize inter-specific differences in such vocalizations, increasing their 
value for taxonomy. Equally understudied are the relationships between the structure of advertisement and release 
calls (but see Leary 2001; Castellano et al. 2002a). According to our observations, often the general structure of 
these two call types bears some similarity. For example, species with a pulsed advertisement call usually also have 
a pulsed release call. This, however, might be influenced more strongly by morphological constraints of the sound- 
producing apparatus than by selective forces. 

An additional subcategory of reproductive calls consists of the courtship calls of males and females. These are 
sounds emitted in some species when males and females are in close proximity. The male courtship calls might be 
just modifications of regular advertisement calls {e.g., with longer durations of notes or calls; Rosen & Lemon 
1974; Wells 1980; Wells & Taigen 1986; Klump & Gerhardt 1987). In other cases, male courtship calls can 
distinctly differ from advertisement calls, especially in species with complex mating behaviors, and sometimes 
more than one type of courtship call exists (Wells 2007). In some species, females respond with courtship calls, but 
such behavior has been documented in a few species only. Little information on courtship calls is available. They 
might be genuinely restricted to a limited number of species, or just be rarely heard and recorded, given the rarity 
of observing courtship behavior and the fact that they are often less intense than advertisement calls (Wells 2007). 

Rain calls and amplectant calls have also been described as types of reproductive calls (Toledo et al. 2005a). 
Their function is poorly understood. They are rarely emitted and therefore have received little attention in 
taxonomic comparisons. 

Terminology within the overarching category of aggressive calls is less unanimous in the literature, also 
because a reliable identification of the function of such calls requires in-depth study of a species' social behavior. 
Calls included in one of the subcategories of aggressive calls (Table 2) are not always easy to distinguish from the 
advertisement call, given that often the advertisement call confers both mating and territorial signals, and might be 
composed of different note types having respectively a stronger function in attracting females or signalling to other 
males (Narins & Capranica 1978; Toledo et al. 2015a). Yet, in some species, specific signals are emitted during 
close contact or fighting between males. Aggressive calls usually differ from advertisement calls of the same 
species in temporal variables, although similarities in general structure and frequency are often apparent, as with 
release calls (Wells 2007). Some species possess a graded signaling system with variable aggressive and 
advertisement calls that can grade into each other, for example, by a gradual change of duration or number of notes 
(Wells 2007; Toledo et al. 2015a). Some authors have also used the term aggressive call to refer to defensive/ 
distress calls when these are emitted while retaliating to a predator. We follow Toledo et al. (2015a) in restricting 
aggressive calls to those emitted in an intraspecific behavioral context. 

Defensive calls (including distress, warning and alarm calls) are typically emitted in response to an attack or 
approach of a potential predator and probably are aimed at startling or deterring it. The most common defensive 
calls are distress calls, which have a characteristic structure, often being loud screams or hissing sounds emitted 
with open mouth. Similar hissing sounds can also be emitted by some larger frogs during the approach of a 
potential predator, or even while attacking this predator, typically along with a threat display; we name these calls 
warning calls (Toledo et al. 2015a). A third subcategory of defensive calls are made up by alarm calls', these can he 
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screams but might also be other kinds of vocalizations, emitted by frogs without evident threat display and are 
probably emitted to confuse the predator or to alarm conspecifics. Although many anurans have the potential to 
emit defensive calls, reports and recordings are rare. The general structure of distress calls is rather similar across 
species (Hodl & Gollmann 1986) and it is unlikely that these could provide reliable taxonomic information. 
However, in some extreme cases, as in extinct anurans, the only call information available for a species might refer 
to defensive calls (Martinelli & Toledo 2016). 

We here propose in a preliminary way a new category of c&Ws, feeding calls, to refer to the sounds emitted by 
tadpoles and juvenile frogs of different species, often in the context of feeding. For instance. Reeve et al. (2011) 
documented calls of a probable aggressive function during feeding in tadpoles of the Malagasy frog Gephyromantis 
azzurrae that fit into this category. The sound production mechanism of these immatures is unknown, but it is 
likely that sounds are produced with the respiratory system and thus qualify as vocalizations because tadpoles of G. 
azzurrae emit sounds during an opening and closing of the mouth (Reeve et al. 2011). Also, juvenile spadefoot 
toads (Pelobates fuscus) emit feeding calls in the presence of prey, probably in the context of a general arousal (ten 
Hagen et al. 2016). The underwater calls of tadpoles of Ceratophrys (Natale et al. 2011; Salgado-Costa et al. 2014) 
probably fit into the warning call subcategory, but might also in part represent feeding calls. 

Anecdotal evidence suggests feeding calls occurring in immature frogs of additional species. Observations 
have been made in captive-bred juveniles of Xenopus victorianus (F. Glaw, pers. obs.), in juveniles of Pelobates 
fuscus (ten Hagen et al. 2016; see also Nollert 1984) and in metamorphs of Phyllomedusa burmeisteri (Toledo et 
al. 2015a). Such calls might be emitted in a competitive context as has been hypothesized in fishes (Amorim & 
Hawkins 2000; Amorim et al. 2004; Polgar et al. 2011). Alternatively, calls emitted by metamorphs or juveniles 
while foraging could be related to group aggregation, as an ecological strategy of defense. Bokermann (1974) 
reported on advertisement calls in pedomorphic male metamorphs of Sphaenorhynchus bromelicola, and distress 
calls emitted by juveniles have been described for some Neotropical species, such as Plypsiboas faber, H. lundii, 
Leptodactylus chaquensis, and L. labyrinthicus (Sazima 1975; Toledo et al. 2005; Toledo & Haddad 2009). 
Whether these calls might also in part represent feeding calls is unknown. However, due to their rarity and the poor 
understanding of their function, we do not recommend their use in anuran taxonomy. 

As emphasized by Littlejohn (1969), isolating mechanisms such as divergence in reproductive signals are often 
intricately related to the speciation process and consequently have an immediate relevance for species delimitation. 
It is therefore reasonable to assume (and has been established in a vast number of case studies) that reproductive 
calls are very suitable characters for anuran taxonomy. Because the most common reproductive calls are 
advertisement calls, these can readily be recorded, described, and compared among species. In the following 
sections, we will focus on these calls, although most of our proposed terminology and discussion will also apply to 
other call types. 


Spectral and temporal variables in anuran vocalizations 

When using hioacoustical traits in taxonomy, a detailed, correct, and verifiable description is essential to 
characterize the traits and their variation, and thereby provide a rationale for species delimitation. Describing and 
illustrating calls requires a basic understanding of the underlying physics of sound. 

Sound waves represent a pattern of disturbance in pressure of a transfer medium (typically air or water in 
anuran sounds). The source is usually a vibrating object (such as a vocal cord) disturbing molecules in the medium; 
this disturbance is then propagated by these molecules to those next to them, and so on. Thus, inside the transfer 
medium, the molecules show a movement pattern leading to alternating higher and lower density (compression and 
rarefaction of molecules in the medium). In their simplest form these fluctuations of the pressure in the medium 
can be represented as a sinusoidal curve. The period by which such a curve repeats is the wavelength, and inversely 
correlated to this period is the frequency of the sound (number of wave repeats per time unit). 

Pure sinusoidal (sine) waves are thus easy to describe but are uncommon in nature. Although some anuran 
calls resemble such sine waves, most calls consist of a complex waveform presenting modulations of frequency 
(caused by increasing/decreasing wavelength) and amplitude (caused by increasing/decreasing sound pressure) 
within a wave. The resulting overall waveform is graphically illustrated as an oscillogram (amplitude vs. time; Fig. 

3). 
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TABLE 2. The anuran vocalization repertoire, organized by category and subcategory (as reviewed by Toledo et al. 2015a; with the addition of Has. feeding call category), usefulness for taxonomy, 
recording difficulty (based on the frequency of observation in nature), and related/suggested selective pressure promoting its divergence or convergence. 
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Relative amplitude, proportional to sound intensity and sound pressure. 
Not to be expressed in absolute values (except if calibrated). 


sound energy is concentrated 
(dominant frequency) 



Oscillogram 


Use oscillogram to measure 
temporal variables of the sound! 
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FIGURE 3. Two calls of Dryophytes andersonii (recording taken from Elliot et al. 2009) depicted in exemplary speetrogram, 
oscillogram and power spectrum, showing units and explaining details and analytical purpose of the graphs. Graphs were 
produced with the R package Seewave (Sueur et al. 2008a). 


A natural sound consists of a base frequency (fundamental frequency) and several additional frequencies. 
These additional frequencies are integer multiples of the fundamental frequency and are called harmonics. 
Ultimately, harmonics result from extra oscillations that originate in different regions of the vibrant structure once 
the main wave bounces back from the fixed extremes of that structure. The fundamental harmonic (also called first 
harmonic) will emanate from oscillations between the two extremes with amplitude peaks in the middle of the 
cord; the second harmonic will be caused by oscillations between the extremes and one node located at half the 
cord length, and so on until reaching the motionless extremes of the cord. As a result, the frequency of the resulting 
harmonics will he a multiple of the main oscillation or fundamental frequency (for a detailed explanation see 
Bradbury & Vehrencamp 2011). 

Frequencies observed in natural vocalizations of anurans often depart from this basic model. The vibration of 
the cords is influenced hy the form of the vocal cords and by fibrous material sometimes attached to them, which 
determines the presence and the amplitude of harmonics (Wilczynski et al. 1993; McClelland et al. 1996, 1998; 
Gridi-Papp et al. 2006). In the rather rare cases where the vocal cord vibrations are near sinusoidal, little energy 
will be visible in the higher-frequency harmonics, but the less sinusoidal a vibration is, the more the higher- 
frequency harmonics will become apparent. Modifying this effect, and further adding to the complexity of 
waveforms, are various factors like deflection, rebounding and reflection in the buccal cavity (Gerhardt & ITuber 
2002; Bradbury & Vehrencamp 2011), which lead to overlap, cancellation or amplification of particular 
frequencies. 
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The method applied to analyze and visualize the frequency composition of complex waveforms was developed 
mainly by J.B.J. Fourier (1768-1830) and J.P.G.L. Dirichlet (1805-1859) (Bradbury & Vehrencamp 2011). 
Basically, a Fourier analysis decomposes a function of time (the waveform in acoustics) into its component 
frequencies and their relative amplitudes, converting a time-domain function into a frequency domain. Flence, 
periodic and continuous waveforms are decomposed into simple sine waves, which are easier to measure. The Fast 
Fourier Transformation (FFT, as used by modern analysis software) accelerates the calculations by discretizing the 
time domain into multiple fragments of sound. 

The results can be graphically represented as a spectrogram (time vs. frequency; Fig. 3) and as a power 
spectrum (frequency vs. amplitude; Fig. 3). When used on digitized sounds, the output of a FFT analysis will 
depend on the sampling accuracy, sampling rate (number of amplitude measurements per time unit), length of the 
discrete fragments (FFT length or number of points in FFT), the overlap between the discrete fragments (expressed 
in percent or points), and the selected windowing function (Table 1). 

Spectrograms and power spectra serve to illustrate the spectral properties characterizing a sound, but are 
strongly dependent on the above parameters. Most importantly, the FFT function will always result in an 
unavoidable trade-off between temporal and spectral resolutions. An increase in spectral resolution will necessarily 
reduce the temporal resolution and researchers should bear this in mind when choosing FFT parameter values. 
Different researchers can select different combinations of parameters and produce radically different visualizations 
of sound (Fig. 4). 

Although a spectrogram with reasonable high frequency resolution provides information about the structure of 
a sound over time, it is not suitable for providing precise information on its temporal variables. In anuran calls, this 
refers to measurements of the duration of call, note, pulse, and of intervals. These temporal variables should be 
measured on the oscillogram. Spectral variables can be visualized in spectrograms (especially those that involve 
changes of frequency over time, as frequency modulation), but should not be manually assessed from the 
spectrogram either. Instead, they should be measured using a power spectrum (Zollinger et al. 2012), or using an 
integrated frequency analysis tool available in some programs. 

Whereas calls and notes in an anuran vocalization are subjective categories (see next section), a pulse can be 
defined in physics as a transient (time-limited) disturbance in a medium (J.e., a burst of sound energy). Pulses in 
bioacoustics have been defined as single unbroken wave train isolated in time by significant amplitude reduction 
(e.g., Broughton 1963). Although in many cases the identification of pulses is obvious, precisely identifying these 
units based on a strict application of their physical properties is not always straightforward. Where terminological 
accuracy is required and compliance with the physical definition of a pulse is uncertain {i.e., pulses are not clearly 
separated and thus not countable), a generally pulsatile structure (see below for definition) might nevertheless be 
evident and named as such rather than attempting to precisely measure or count single pulses. 




Time (ms) 


FFT = 256 





OdB 


Q. CD 
CD 


-30 dB 


FIGURE 4. Comparative spectrograms of a call of Dryophytes andersonii, all drawn with Hanning window function, showing 
the effect of different FFT resolution on the graphic representation of calls. Each spectrogram shows the identical 300 ms 
section of a recording. Note that with higher FFT settings, the spectral detail of the call representation increases. At lower FFT 
settings, the temporal pattern of the call is more clearly recognizable. Call recording taken from Elliot et al. (2009). Graphics 
produced with the R package Seewave (Sueur et al. 2008a). 
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Sound categories in anuran calls 


As a first approximation to define the main appearance of a vocalization, it is useful to consider the very general 
sound categories proposed by Beeman (1998) for animals. These categories, here slightly modified and illustrated 
with anuran call examples (Fig. 5), refer to general properties of a sound, independent of the vocalization subunit to 
which the sound belongs {e.g., call, note) except for the definition of pulse, because presence and delimitation of 
pulses is instrumental for several of Beeman's (1998) sound categories. The general sound category to which an 
anuran advertisement call belongs is important information, and therefore should be mentioned and described when 
used in taxonomy. 

(1) Tonal sounds are those containing a single frequency component at any time instant, even if frequency or 
amplitude varies over time. In general, the frequency spectrum can vary between a pure tone, which contains only 
a single frequency, along a continuum to a so-called white noise, which contains all frequencies with equal energy 
(thus not being tonal). Animal vocalizations on the tonal side of the continuum typically are whistles, but might 
appear as clicks if the subunits are very short. Many bird and whale vocalizations classify as tonal, and within 
anurans, there are numerous species producing exclusively or partly tonal sounds. Some birds can produce 
polytonal sounds, where two independent sound-production systems operate simultaneously, but an equivalent 
double sound production is unknown in amphibians (but see Souza & Fladdad 2003). Spectograms of tonal sounds 
may or may not contain visible harmonics; in most tonal anuran calls harmonics are present, even if only detectable 
at short recording distances (see below). 

(2) Pulse-repetition sounds are series of energy bursts (pulses). Pulses are transient (time-limited) disturbances 
in the medium and in bioacoustics can be defined as short bursts of sound energy. Pulsed calls are common among 
anurans, and we extend the definition of this category insofar as the pulses should be separated from each other by 
distinctly reduced amplitude, but not necessarily 100%, which means there may be no completely silent intervals 
between pulses. 

(3) Sparse-harmonic sounds are those with a relatively small number of harmonically related spectral 
components, without a dominant role of one of the harmonics {i.e., without a very clearly defined single dominant 
frequency as observed in tonal sounds). Such sounds are rare in anurans, but some distress calls approach this 
category. 

(4) Dense harmonic sounds have a larger number of harmonically related spectral components. Some anuran 
distress calls qualify for this category. A pulsatile structure is visible in the oscillogram but with a relevant amount 
of sound amplitude between the energy peaks. A distinct structure of harmonics is recognizable in the spectrogram, 
but with spectral components between the harmonics. 

(5) Pulsatile-harmonic sounds are a combination of tonal or harmonic components with an important 
proportion of amplitude modulation. Some remains of a harmonic structure are visible in the spectrogram but 
sounds are emitted over a wide and continuous band of frequencies. Alternating amplitude modulation is 
recognizable in the oscillogram but without silent intervals between the energy peaks. The example shown in 
Figure 5 {Andinobates fulguritus) has signals emitted with a broad bandwidth with distinct harmonic structure (but 
without tonal component), and with energy clearly concentrated in a narrow frequency band. 

(6) Spectrally-structured pulsatile sounds are emitted over a wide frequency band with one or more spectral 
peaks but without a visible structure of harmonics. Alternating amplitude modulation can be recognized in the 
oscillogram, but usually without discrete energy peaks that could unambiguously be referred to as pulses. The 
example shown (Fig. 5;Amietia angolensis) shows a call where energy is concentrated at different frequency bands 
but without a clear harmonic structure. Further, the pulse structure is indistinct, with some pulses rather well 
delimited in the beginning of the call, but these energy maxima becoming denser and fused in the second half of the 
call; a clear count of pulses is therefore not possible, complying with the definition of pulsatile (see below) and 
thereby differing from pulse repetition calls. 

Further important properties of animal sound, also frequently observed in anuran calls, are modulations of 
amplitude and frequency. These terms describe whether the frequency and amplitude of a call (or of a subunit of a 
call) remain constant over its entire duration. For instance, the intensity of a call or a note can increase or decrease 
from their start to their end, and such a pattern can be described as amplitude modulation. Likewise, if the 
frequency (fundamental or dominant, of a call or note) increases or decreases over time, such a pattern can be 
described as an ascending or descending frequency modulation, respectively. 
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FIGURE 5. Spectrograms and oscillograms of anuran advertisement calls conforming to the general sound categories 
proposed by Beeman (1998) for animals (and slightly modified herein). All graphics produced with the R package Seewave 
(Sueur et al. 2008a), from recordings of AmphibiawebEcuador.org {Hypsiboas tetete\ Hylidae), Dendrobates.org {Andinobates 
fulguritus: Dendrobatidae), Vences et al. (2006) {Rhombophryne coronata: Microhylidae), Du Preez & Carruthers (2009) 
{Ptychadena anchietae: Ptychadenidae; Tomopterna marmorata and Amietia angolensis: Pyxicephalidae), Cocrofl et al. (2001) 
{Ceratophrys cornuta: Ceratophryidae), Elliot et al. (2009) {Dryophytes andersonii: Hylidae). All spectrograms at Hanning 
window function, 512 bands resolution. 
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Units and terms recommended for the description of anuran calls 

While the term ‘call’ is well established when referring to anuran vocalizations (see Wells 2007; Toledo et al. 
2015a), different terms and different definitions of the same term have been used in other animals. Complex 
vocalizations of birds are often named songs (Baker 2001; Catchpole & Slater 2008), whereas the term ‘calls’ is 
usually used to refer to sounds of lower complexity emitted by birds in a non-reproductive context. Sounds 
produced by orthopterans and cicadas (e.g., Robinson & Hall 2002), and even by flies (e.g, Kyriacou & Hall 1986; 
von Philipsborn et al. 2011) and diplopods (Wesener et al. 2011) are often named songs as well, even if these are 
much more stereotyped and less complex than songbird vocalizations. In mammals, human speech apart, 
vocalizations are more commonly referred to as calls, while the complex vocalizations of gibbons and bats are 
named songs by some authors (e.g., Cowlishaw 1992; Clarke et al. 2006; Smotherman et al. 2016). Subaquatic 
sounds can be emitted by insects (e.g, Sueur etal. 2011), crustaceans (e.g.. Popper etal. 2001), cetaceans and other 
marine mammals (e.g., Edds-Walton 1997), anurans (Wells 2007), and by a multitude of fish species (Sisneros et 
al. 2016). The latter are often described by onomatopoeia such as ‘hum’, ‘grunt’ and ‘growl’ (e.g., Mclver et al. 
2014), or functional categories such as agonistic and submissive sounds (e.g., Colleye & Parmentier 2012). 

Given the complexity of sounds emitted by animals, it is tempting to propose complex classification schemes. 
For instance, bird songs have been subdivided in a variety of subcategories (e g., Shiovitz 1975) of which as many 
as 28 have been compiled by Thompson et al. (1994), including syllable, note, bout, phrase, trill and element. 
These authors also proposed a complex formula system for bird songs, which however has not been adopted by 
many ornithologists. Also for anurans, there would be a plethora of possibilities to define and propose a rather 
particularized terminology. We are, however, convinced that the continued use of the basic and well-established 
units call, note and pulse, if properly defined, is more appropriate to efficiently describe anuran vocalizations. We 
are aware that terminology and definition of units and structures in anuran vocalizations will continue to be a 
matter of debate, as classification of natural phenomena into fixed human-made categories by principle has to fail 
to a certain extent. Several authors already provided definitions of units and terms (e.g, Heyer et al. 1990; 
Schneider & Sinsch 1992; Duellman & Trueb 1994; Glaw & Vences 1994; Kohler 2000; Toledo et al. 2015a), but 
these were not necessarily in complete agreement, not followed, or in part inconsistently applied in subsequent 
works. Herein, our proposed definitions and terminology are aimed at reaching a maximum consensus among 
anuran taxonomists without ignoring or loosing aspects of logic and current knowledge. Even if far from being 
perfect and criticizable for various reasons, sticking to a standardized terminology in practice, as far as possible, 
will prevent many of the current pitfalls in anuran call descriptions, comparisons and their interpretation. This will 
lead to a more standardized approach and, consequently, to a more stable taxonomy. We here focus on 
advertisement calls as the most relevant functional call category for taxonomy, defining basic acoustical units first, 
followed by further useful terms. 

(1) Call .—We define a call as the main acoustic unit in a frog vocalization (Figs. 6—7). In advertisement calls, 
theoretically, this functional entity is responsible for mate recognition. Calls are separated from other calls by silent 
inter-call intervals, typically longer (often several times longer) than the call. A single call can often be emitted 
{i.e., not as part of a coherent series of defined duration). 

(2) Note .—Calls are often broken into notes. These are smaller subunits, almost always separated by intervals 
of silence {i.e., 100% amplitude modulation), with the duration of these intervals being short relative to the 
duration of the note (often shorter than, or not much longer than the note itself). Silent intervals and notes are 
typically long and distinct enough to be discernable by the human ear. A call consists of a single note if no such 
subunits can be distinguished. 

(3) Pulse .—As stated above, a pulse by principle has a physical definition that can be applied to any discrete 
sound unit. Some of such units should better be termed calls or notes in bioacoustics, but they can also be pulses by 
physical concept. For anuran bioacoustics, we suggest restricting the term pulse to sound bursts within calls or 
notes. As defined here, a pulse is the shortest, undividable unit in anuran vocalization. We recommend restricting 
the bioacoustics term pulse to short undividable sound units, typically in the range of 5-50 ms, although longer 
sounds without amplitude modulation (usually tonal), strictly speaking would fall into the physical definition of the 
pulse category as well. 

Anuran vocalizations can be tonal, pulsatile, or pulsed. In a pulsed call, it is usually possible to distinguish all 
or most single pulses when analyzed. Although in most cases the identification of such a unit as pulse will be 
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uncontroversial, difficulties might arise in complex vocalizations. Pulses are often separated from each other by an 
amplitude modulation of less than 100% {i.e., no completely silent inter-pulse intervals), but in some cases, spaced 
pulses with silent intervals do occur (Fig. 8). Because pulses in our proposed definition are a basic unit, no 
subdivisions such as subpulses should be used. Also, for clarity, we do not suggest the use of the term pseudopulse 
(de Araujo et al. 2011) which has been coined to avoid strictly complying with the physical definition of pulse. 
Where a call contains two different types of pulses (e g., separated by intervals of different length, or by different 
degrees of amplitude modulation) it might be an option to describe these as different pulse types or as primary and 
secondary pulses, or to use the term pulse group, but we here refrain from attempting definitions for any of these 
categories. Pulsatile notes as defined here are neither tonal, nor clearly pulsed, but apparently exhibit some barely 
quantifiable alternating amplitude modulation. In the literature, such pulsatile notes have often been described as 
being noisy, referring to their acoustic character caused by alternating amplitude modulation (Fig. 8). 
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FIGURE 6. Hierarchy of main units and subunits proposed for the description of anuran vocalizations. Call, note and pulse are 
primary units (in gray boxes). Call is the fundamental unit which might consist of a single note or several notes. In call 
descriptions, units can consist only of subunits in top-down direction of decreasing hierarchy. Pulses are defined here as the 
smallest, undividable unit. 
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(4) Call group/call series. —Calls might be arranged in call groups which are separated from other such groups 
by longer periods of silence; spacing of calls in call groups may change, sometimes in a predictable pattern, or be 
regular; in the latter case, a call group can also be named a call series. 

(5) Note group/note series. —Equivalent to call groups and call series, notes can also be arranged into note 
groups (with each note group being separated from other such groups by silent intervals of longer duration than the 
inter-note intervals). If within such a group of notes the inter-note intervals are regular, then the note group might 
also be called a note series. If a call consists of several notes, spaced with regular or irregular intervals but without 
subdivision into distinct note groups or note series, then there is only a single note group or note series per call, and 
the use of these terms is discretionary. However, we suggest that the term note group should not be used to replace 
the term call. Instead, we recommend that descriptions should for instance state that the call is a series of notes, or 
that the call consists of several note groups. 

(6) Note (repetition) rate. —^Note rate is defined as the number of notes repeated in a defined period of time 
within a call or within a note series. The value is usually provided as notes per second or minute. 

(7) Pulse (repetition) rate. —Pulse rate is defined as number of pulses repeated in a defined period of time 
within a note/call. The value is usually provided as pulses per second. 

(8) Note type. —Arrangement of notes in a call can be even more complex due to the existence of different note 
types that can be arranged in a regular or irregular succession. Often, calls consist of either a single note type or of 
2-3 note types, of which one might have a predominant signalling function towards males (territorial) and the 
other(s) be mainly directed at attracting females {e.g., in Eleutherodactylus coqui: Narins & Capranica 1976, 1978; 
Dendropsophus minutus: Haddad & Cardoso 1992; Toledo et al. 2015a). Calls consisting of a single note type can 
be named simple calls and calls consisting of different note types are complex calls. Delimitation of different note 
types remains somehow subjective, but we recommend basing it on qualitative differences (differences in pulse 
structure and/or amplitude modulation, tonal versus pulsatile or pulsed, etc.) or on quantitative differences in more 
than one acoustic variable. This will avoid excessive subdivisions of calls into note types when only one call 
feature varies between sound units (e.g., continuous variation in duration). 

(9) Dominant frequency. —The dominant frequency of a call or note is defined as the frequency where most 
sound energy is concentrated within the whole power spectrum. In rare cases, it might be difficult to determine this 
frequency with maximum sound energy as there are two or three peaks of almost equal intensity with the most 
powerful varying from one call/note to another, especially in calls with harmonics. In these cases, it is 
recommendable to provide all respective frequency values for similarly powerful peaks. 

(10) Bandwidth.- Physically defined as the total range of frequencies present in the emitted sound. Often, the 
total range of frequencies is rather difficult to measure in field recordings of anuran vocalizations, even with high- 
end equipment, due to the overlap of low energy call components with the background noise. Therefore, in high- 
quality recordings the bandwidth should be measured at a given threshold level which should be clearly specified 
and kept constant in all measurements for comparison purposes. Measurements at -3, -6, and -10 dB from the peak 
amplitude will include the frequencies with 50, 75, or 90% of the sound energy in the call, respectively. The 
specific goals of each study and the signal-to-noise ratio of the set of recordings used, often dictates the power level 
that can be chosen as bandwidth reference. For the sake of comparability, measurements at -10 dB threshold should 
be reported whenever possible, and the resulting frequency range reported as 90% bandwidth. However, in some 
recordings (strong background noise, calls of different species overlapping) such an objective measurement is 
impossible. Furthermore, taxonomists will often have to include in their comparisons and discussions old 
publications with graphical representations of the sounds, without having access to the original recordings. Even in 
such cases, a rough estimate of the range of frequencies encompassing the main proportion of sound energy 
attributable to the vocalization in target is often still possible by careful application of the power spectrum tool to 
various parts of the recording, and in the worst case (old publications) very roughly by visual inspection of the 
spectrogram. We suggest referring to such estimated frequency ranges as approximate prevalent bandwidth. In 
either case, when reporting bandwidth, the crucial values to mention are not the width of the frequency range but 
the actual minimum and maximum frequency values as these are biologically most relevant. 

(11) Fundamental frequency. —The fundamental frequency is the base frequency produced by the vocal cords. 
In many cases, it is the dominant frequency in the call or note. However, there are cases where higher frequencies 
may contain more energy compared to the fundamental frequency. As it can be rather difficult to identify which 
one is the frequency produced by the vocal cords, we consider pinpointing the fundamental frequency of secondary 
importance in these cases, although it should be reported in call descriptions if it can be unambiguously identified. 
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Call-centered and note-centered terminological schemes 

The definitions of calls and notes in the previous paragraphs have the advantage of being widely applicable and 
universal in use. However, in discussions with colleagues and even among the authors of this review, we have not 
been able to reach a complete terminological consensus for all examples of anuran vocalizations. This particularly 
applies when a vocalization could be classified whether as a call group (made up of various calls), or alternatively, 
as a call made up of various notes. 

There are several theoretical approaches to define the units call and note. One approach would be to define the 
unit 'call' as the acoustic entity which is functional in mate recognition, and it could be argued that this approach 
might be of value in taxonomy. In reality, this acoustic entity can either be a single unit (defined as a note in other 
approaches), or a combination of multiple units (otherwise defined as a call group, note group, or note series). 
Using the functional character as criterion for the definition of a call versus a note, in practice is hampered by the 
fact that only few experimental studies have tested which unit is used for mate recognition (e.g., Marquez et al. 
2008). 

Another terminological approach may consider the mode of sound production. In many species, a call (or a 
note) may correspond to the vocalization emitted during a single expiration, that is, one cycle of pumping air from 
the lung through the vocal cords. While this definition of an acoustic unit (note or call) can be a useful yardstick in 
some groups of anurans to help defining homologous units, it is obvious that in taxonomic practice it is often not 
suitable. Although expiration is thought to represent the predominant mode of sound production, multiple modes of 
sound production have been reported (see above) and a single expiration may either produce one or multiple 
distinct sounds. The mode-based approach is furthermore not applicable for species with non-expiratory or 
combined sound production systems. Furthermore, expiration in many cases is rather difficult or even impossible 
to observe. 

Given the practical difficulties to apply these theoretical approaches, we here suggest distinguishing two 
purely practical approaches for the use in taxonomy. We propose to use either a call-centered or a note-centered 
terminology (see Fig. 7 for examples), and to clearly state which of these two approaches is used. 

The call-centered approach typically starts defining a call as the main coherent sound unit (longer than a 
typical pulse), separated from other such units by a distinct period of silence (typically as long as, or longer than the 
call). If a call is subdivided into subunits (longer than pulses) separated by short periods of silence, then these 
subunits are considered notes. If calls are arranged in groups or series, then these are call groups or call series. In 
contrast, the note-centered approach starts defining an entire coherent unit of sound as call. If this coherent unit is 
further subdivided into subunits separated by (long or short) periods of silence, then these subunits are notes, and 
these notes might be arranged in note groups or note series. 

Because anuran vocalizations often consist of sound units arranged in series, the two approaches will often 
differ in defining basic units of sound as either calls or notes, but in other cases will agree on the definitions (Fig. 
7). Uncertainty and discordance about definitions emerges especially when a stereotyped series of sounds is 
emitted. Along with the discussion of this topic in Glaw & Vences (1994), we recommend to distinguish species 
with a finite and relatively regular number of units in such series, and those with a highly irregular number of units 
in a series. As an example among Malagasy frogs (Glaw & Vences 1994; Vences et al. 2006), the species 
Gephyromantis eiselti emits a stereotyped series of tonal sounds {i.e., with regular intervals between the sounds). 
Such a vocalization can last up to 10 seconds and consists of 7-24 sound units, and is followed by a long period of 
silence. In comparison, many Malagasy microhylid frogs emit series of stereotyped tonal sounds that are not a 
priori limited in duration or number of notes, and this also applies to many other anurans such as Neotropical frogs 
of the genus Leptodactylus and Eleutherodactylus (see sections on individual call variation below). These frogs 
often emit series of stereotyped sounds uninterruptedly for many minutes. Indeed, sound emission can endure for 
most of the night. In such cases, defining each single sound as a call is straightforward, such as in Figure 7A (and it 
might be said that calls are arranged in stereotyped call series, with undefined number of calls per call series). The 
situation is more ambiguous for species such as Gephyromantis eiselti as mentioned above. In this species, one 
sound might also be defined as a call, and the entire vocalization as a call series with a defined number of 7-24 
calls (call-centered approach; see Fig. 1C). Alternatively, it might be preferable to define the entire series as one 
call, composed of a series of 7-24 notes, as done by Glaw & Vences (1994) (note-centered approach; see Fig. 1C). 
A note-centered terminology seems to be more appropriate in cases of complex calls composed of different note 
types (Toledo et al. 2015a). 
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FIGURE 7. Concordance and discordance among call centered and note centered approaches to name sound units in anuran 
vocalizations. The upper two schematic spectrograms show examples where both approaches lead to the same categorization of 
sounds. (A) A single tonal sound is repeated after regular silent intervals of longer duration than the sounds. There is no defined 
duration of the series of sounds; if undisturbed, calling could go on for minutes or hours. In both approaches, one sound unit 
would be a call, and the note-centered approach would define each call consisting of a single note. (B) Series of rapidly 
repeated sounds, each composed of a series of bursts of sound energy. Because these bursts are < 10 ms in duration they are 
defined as pulses. The call-centered approach does not define each major subunit as call because the silent intervals between 
them are much shorter than the units themselves; thus, both approaches agree in defining the units as notes. (C) This species 
emits clearly defined and stereotyped series of sounds, each series being separated by variable intervals from the next series. 
The note-centered approach defines one coherent entity of sound emission as a call; hence, each sound series unit is a call, and 
the subunits are notes. In contrast, the call-centered approach defines each sound unit as a call (and each series as a call series) 
because it is separated from other such units by a long silent interval. (D) This species emits two distinct kinds of pulsatile 
sound units, of which one is much longer than the other. Because the combination of sounds is emitted as coherent entity, in the 
note centered approach the entire sound emission is a call and the sound units are notes of two types, of which one is arranged 
in a series. In the call-centered approach, each sound unit is a call because they are separated by long silent intervals from the 
next unit. Two call types can be distinguished and one of these is arranged in a call series. 


No pulses, tonal Distinct puises, partiy fused, no silent intervals 

Yunganastes pluvicanorus Oreobates sanctaecrucis 



0 400 800 0 250 500 


No distinct pulses, narrow amplitude modulation, “puisatiie” 

Dendropsophus coffee 



0 200 400 

Time (ms) 


Distinct pulses, separated by silent intervals 

Hypsiboas marianitae 



FIGURE 8. Comparative oscillograms of notes of four anuran species, illustrating differences in amplitude modulation and 
respective differences in descriptive terminology, particularly the bioacoustical application of the term pulse as defined herein. 
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The use of these alternative approaches depends to a degree on the subjective preference of the observer. As 
with many biological phenomena, it is clear that descriptions cannot fully account for the complexity of anuran 
vocalizations. It should, however, be kept in mind that the primary goal of such descriptions is to facilitate 
communication and research. Hence, it is of prime importance to keep the comparability among descriptions of 
vocalizations of related species (see Fig. 9). 
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FIGURE 9. Example illustrating the need to consider homology aspects in terminoloy of anuran vocalizations. The calls 
shown are from four related species of manteilid frogs in the nominal subgenus of the genus Gephyromantis. The four species 
emit vocalizations consisting of a series of sound units (each corresponding to one expiration), with a defined number of units 
per series. Ail spectrograms are to scale; for G. boulengeri, an entire series is shown whereas the remaining spectrograms show 
parts of a series. In a note-centered terminology, one entire series would be a call, and each sound unit a note. In a call-centered 
terminology, in G. boulengeri, a series might be defined as one call (because no intervals of full silence occur between sound 
units), while in G. enki, each sound unit would be a call (separated by wide intervals of silence from the next call) and the series 
would be a call series. Either definition might be appropriate when looking at a single species, but in a comparative taxonomic 
study, it is of utmost importance to compare homologous bioacoustical entities and to apply the same name to them; hence, in a 
call centered approach, also the vocalization of G. boulengeri would need to be dubbed a call series. Spectrograms made with 
the R package Seewave (Sueur et al. 2008a) at Hanning windowing function, 512 bands resolution. Note that we here refer to 
homology from the perspective of sound production (one unit corresponding to one expiration) and not from the perspective of 
signal content of the respective sound unit. 

We here distinguish primary descriptive units of anuran vocalizations (call, note and pulse) and secondary 
units (all others) (Fig. 6). In any description of an anuran vocalization, it should be defined what a call is, and a call 
by definition will consist of notes (either of a single note, or of several ones). The pulse is a primary unit because it 
is rather clearly defined. It should therefore be used to refer to vocalizations that contain units matching this 
definition (Fig. 8), but certain calls are not pulsed and this term will therefore not be used in the respective 
descriptions. 

Whether it is useful to use secondary units such as call group or note group in a description remains a 
discretionary decision. But whatever a researcher decides in this respect, it should be mandatory to clearly define 
the units and strictly respect their hierarchy, as illustrated in Figures 6-7. Ideally, all units used in a certain 
contribution should be indicated once in the published figures. 


BIOACOUSTICS IN ANURAN TAXONOMY 


Zootaxa A25\(\) © 2017 Magnolia Press • 25 



























Intraspecific variation in frog advertisement calls 


Advertisement calls of anurans are usually considered species-specific. Consequently, and despite the often- 
restricted gene flow among amphibian populations, many frog species have remarkably uniform calls across their 
distribution ranges. This statement appears paradoxical given the large body of literature dealing with such 
variation (reviewed in the following sections), but in the majority of cases, intraspecific call variation in anurans 
refers to relatively subtle differences in quantitative variables, and not to fundamental differences in call structure. 
As suggested by Vences & Wake (2007), it remains to be critically tested whether this might be due to circular 
reasoning —because frog populations with strongly divergent calls would be considered as distinct species by 
taxonomists, thereby eliminating instances of intraspecific variation (but see Amezquita et al. 2009 and Rowley et 
al. 2015 for examples of substantial call variation among populations). One obvious hypothesis that requires 
thorough testing is that call variation within species might originate and increase more slowly than does 
morphological variation, and whether this difference is accentuated in species from open areas showing little 
phylogeographic structure (Rodriguez et al. 2015a). In fact, examples of intraspecific morphological variation in 
amphibians (e.g., over elevational dines) are well known to herpetologists, although the amount of such variation 
has been rarely quantified. It includes variation in body size and hindlimb length, as in Palearctic brown frogs such 
as Rana macrocnemis and R. temporaria (Tarkhnishvili et al. 1999; Vences et al. 2013), body size and skin texture 
(e.g., in European widespread toads of the genus 5 m/o; Arntzen et al. 2013; Cadenovic et al. 2013; and invasive 
cane toads; Shine et al. 2011), body size in montane hylid frogs (Amezquita 1999), and color polymorphism in 
multiple species (reviewed in Hoffman & Blouin 2000). 

From an evolutionary perspective, it is likely that sexual selection plays a primary role in acoustic divergence 
between populations and species of anurans and other animals (reviewed by Wilkins et al. 2013), but 
environmental factors might be of considerable influence as well (e.g., Goutte et al. 2013; Vargas-Salinas & 
Amezquita 2013). 

Bioacoustical variation in anurans is generally studied at four levels: (1) within individuals, (2) between 
individuals of the same population, (3) between (geographically separate) populations of the same species, and (4) 
between independent evolutionary lineages {i.e., species). Drawing accurate taxonomic conclusions requires a 
correct distinction of individual and intraspecific variation (levels 1-3, taxonomically not relevant) from 
interspecific variation (level 4; highly relevant for species delimitation and identification). As we will review in the 
following sections, call variation within and between individuals of many frog species is great and is strongly 
influenced by individual motivation of the calling male due to intrinsic and/or extrinsic factors. This affects mainly 
call variables that can be defined as dynamic (see below), but also extends to emission of different call types as in 
some frogs call variation is exacerbated by the gradation and combination of different note and call types. 
Aggressive and advertisement calls might be part of a graded signaling system in which components of the call can 
be gradually adjusted according to social context, for example, by the distance between interacting males 
(Schwartz 1986; Wagner 1989a, c; Grafe 1995; Jehle & Arak 1998; Reichert 2013a; reviewed in Wells 2007 and in 
Toledo et al. 2015a). In some species, a clear distinction of aggressive and advertisement calls is difficult or 
impossible, and a hyperextended vocal repertoire is observed especially in highly motivated individuals {e.g., 
Amnirana nicobariensis, Dendropsophus minutus, Polypedates leucomystax, Boophis inadagascariensis: Jehle & 
Arak 1998; Narins et al. 2000; Christensen-Dalsgaard et al. 2002; Toledo et al. 2015a). A well-studied example is 
the Neotropical frog Dendropsophus ebraccatus (Wells & Schwartz 1984; Wells 1989; Reichert 2010, 2011a, b, 
2013b), where advertisement and aggressive calls can grade into each other: in response to increasing acoustic 
competition, males increase the duration and reduce the pulse-repetition rate of the primary note by reducing the 
number of secondary notes at the same time, resulting in a highly escalated aggressive call that is less attractive to 
females than the advertisement call. 

Whereas there is general agreement that interspecific variation serves species recognition (Ryan & Rand 1993; 
Gerhardt & Huber 2002), the causes and possible functions of intraspecific variation in animal signals are 
insufficiently resolved (see below; Table 3). Nevertheless, species recognition and mate preference can be seen as 
part of the same process (Gerhardt 1982; Ryan & Rand 1993; Castellano et al. 2002b). Regarding the taxonomic 
importance of call traits, in theory we would expect a pattern of variation that is a (multimodal) continuum with 
increasing variation. This variation can be expected to be lowest within individuals, followed by variation between 
males in the same population, between conspecifics of different populations, and with highest divergences found 
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between species (and theory predicts that this last case of divergence is particularly high in cases of sympatry) 
(Table 3; Fig. 10). 

Only few studies on call variation actually address more than one or two of the above-mentioned levels of 
intraspecific variation (e.g., Castellano & Giacoma 1998; Castellano et al. 2002b; Gerhardt 2012), or combine a 
comprehensive intraspecific approach with interspecific variation (Forti et al. 2016). However, assessing variation, 
no matter on which level, is taxonomically relevant and important for the understanding of species delimitation and 
species recognition, as well as for the understanding of speciation and signal evolution. 

If bioacoustical characters are to be useful for taxonomy, then a prerequisite is that their variation between 
species should exceed variation within species, and that bioacoustical divergence above a certain threshold and in 
certain traits should be indicative of species-level divergence. How such a threshold can be identified and which 
measurable variables of a call are most suitable for taxonomic purposes will require careful assessment on a case- 
by-case basis for different groups of anurans and different geographical scenarios. The following sections will 
review the available evidence on factors influencing call variation. 
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FIGURE 10. Individual, intraspecific and interspecific call trait variation in Leptodactylus spp. exemplified by the trait call 
duration. (A) Individual variation in call duration during one night in individual A (one nightly calling activity phase of ca. 1 
hrs of calling; N = 4,401 calls; 18 November 2014; 29.1 to 29.4 °C); (B) Intraspecific variation: comparison of call durations of 
three individuals during each one night of calling (individual A: see above; individual B: n = 22,472 calls, ca. 5:20 hrs of 
calling, 27 November 2014, 23.0 to 23.4 °C; individual C: n= 15,561, 2:50 hrs of calling, 14 November 2014,22.7 to 23.4 °C); 
(C) Interspecific variation: comparison of Kernel density estimates of call durations of three sympatric species (L. syphax\ same 
three individuals as above, n = 38,434 calls; L. mystacimis: one individual, ca. 3:20 hrs of calling, n = 49,573, 24 January 2012, 
25.1 to 25.9 °C; L. vastus, one individual, ca. 1:10 hrs of calling, n = 3,649 calls, 16 November 2014, 25.2 to 26.4 °C). All 
recordings were done at the Research Station ‘Chiquitos’, Bolivia, with Song Meters SM2 (Wildlife Acoustics) respectively 
Olympus DM-550 recorders (sampling frequency 22.05 kHz; 16-bit resolution), and afterwards analyzed with software Raven 
Pro, version 1.4 (Bioacoustics Research Program 2011) using implemented amplitude detectors; statistics were done with R; 
only calls with high amplitude were considered {i.e., less intense ‘initial calls’ of a series were excluded; M. Jansen, 
unpublished data). 
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Variation within individuals 


Energetics and endocrine control 

Frog calls can be very loud sounds, with sound pressures of 100—120 dB at 50 cm distance (e.g., Gerhardt 1975; 
Passmore 1981), and in many species of frogs, emission of advertisement calls is probably the most costly activity 
in terms of energy expenditure (Pough et al. 1992; Grafe et al. 1992; Grafe & Thein 2001), with metabolic rates of 
callers increasing up to tenfold over the resting metabolism (Wells 2007). Temporal variables of calls correlate with 
energy expenditure and male frogs can adjust them in the presence of other callers. Call variation is therefore 
influenced by energy budget and social context (Wells & Taigen 1986). Recent findings, however, indicate that 
relationships between calling activity and energy consumption may be complex (Carvalho et al. 2008), 
emphasizing the need for more comprehensive observations of amphibian traits related to calling activity, such as 
incorporating not only social context, but also breeding period, locomotor behavior, and calling strategies. 

Intricately related to these two factors are hormones (Moore et al. 2005; Wilczynski et al. 2005; Arch & Narins 

2009) . Several studies have suggested that androgens and the neuropeptide arginine vasotocin (AVT) influence the 
calling behavior of frogs (Emerson & Fless 1996; Solis & Penna 1997; ten Eyck 2005), and higher androgen levels 
have been found in frogs exposed to conspecific vocalizations (Brzoska & Obert 1980; O'Bryant & Wilczynski 

2010) . Intense calling leads to increased energy expenditure, and the associated stress causes corticosteroid 
hormone levels to rise (Emerson & Hess 2001; Leary et al. 2004). These have been found to be particularly high in 
individuals and species of high calling activity (Emerson & Hess 2001), and, when at high levels, corticosteroids 
can inhibit calling (Burmeister et al. 2001). The available results from different species are however not fully 
concordant, especially regarding the relationship of androgen levels and calling behavior (Moore et al. 2005; Wells 
2007). Evidence is more straightforward suggesting that AVT stimulates calling behavior (reviewed by Wilczynski 
et al. 2005) and, more importantly, that it influences call features such as call patterning, call duration and pulse 
number (Marler et al. 1995; Chu et al. 1998; Klomberg & Marler 2000; Trainor et al. 2003; Kime et al. 2007). 
Taken together, the available evidence suggests that social interactions, hormones, and energetics are tightly linked 
to each other in numerous ways, and all of these factors have the potential to influence those features of frog 
vocalizations that are often considered relevant for taxonomic purposes. 


Static and dynamic call traits 

In an influential work, Gerhardt (1991) suggested that patterns of variation in anuran call traits are related with 
female preferences, and that different traits encode different kinds of biologically significant information {i.e., they 
have different functions in interactions with conspecifics or heterospecifics). He proposed that on the within- 
individual level, less variable {static or stereotyped) traits might encode species recognition and populational or 
individual identity, whereas more variable {dynamic) properties might transmit information on mate quality 
(Gerhardt 1991). He proposed the classification into static and dynamic traits as ends of a continuum, by using 
thresholds of the coefficient of variation (CV = SD* 100/mean): static traits are those with CV values less than 5%, 
whereas dynamic traits are those having CV values above 10% (Gerhardt 1991). Distinguishing static vs. dynamic 
traits appears paramount for taxonomy: differences (between individuals or populations) in static characters can be 
hypothesized to be more taxonomically relevant than differences in dynamic characters {i.e., those characters that 
in the target group have been demonstrated to be dynamic in the same individual or population). 

This concept of assessing call variation by CVs was originally suggested for the ‘within bouf variation. It is 
mostly used on the within-individual level (CV^„; Gerhardt 1991) and has become a standard method (Bee et al. 
2016). CVs may be used on other levels as well, such as the comparisons among / between individual males (CV^or 
CV^), and very often the relation of within- and between-individual variation (CV^„ / CVJ is used, for example, to 
test if individuals differ from each other by their calls {i.e., CV^ > CV^; Bee & Gerhardt 2001; Bee et al. 2001, 
2010, 2016; Bee 2004a; Prohl 2003; Gasser et al. 2009; Feng et al. 2009a; Kaefer & Lima 2012; Gambale et al. 
2014; Forti et al. 2016). 

However, it is important to mention that Gerhardt (1991), as well as several subsequent (mostly experimental) 
studies, suggested that static properties are under stabilizing or weakly directional selection, because females often 
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prefer values at or near the mean of the population, and that dynamic properties seem to be under directional 
selection, because females tend to prefer extreme values (Gerhardt 1991; Castellano & Giacoma 1998; Wollerman 
1998; Friedl 2006; Reinhold 2011). 

Reinhold (2009) conducted a comprehensive literature review and did not find evidence for a general bimodal 
pattern of variation (z.e., static vs. dynamic) in acoustical advertisement call traits in insects and anurans, 
confirming Gerhardfs (1991) characterization of a continuum framed by these two extremes. A main problem in 
characterizing variation of call traits (and thus their position in the static-dynamic continuum) was that their 
variation increased with the duration of the analyzed calls. Because the time span over which measurements are 
taken increases the number of influential factors, it is likely that more trait variation is inferred from longer calls. 
According to these results. Reinhold (2009) concluded that the variation of acoustic signal traits cannot be used to 
classify traits into two groups. However, in a subsequent meta-analysis of sexual selection strength and trait 
variability in anuran and insect sounds, Reinhold (2011) showed that traits under stronger selection had lower 
variation even after controlling for signal duration, supporting the hypothesis that lower CVs might be caused by 
stabilizing selection of female preferences (Gerhardt 1991; Castellano & Rosso 2006). Gerhardfs suggestion that 
static properties are more important in species recognition and that dynamic properties are more important in mate 
choice most probably still holds (Gerhardt 1991, 1994b). Thus, from a taxonomic perspective, a clear-cut 
distinction of static vs. dynamic traits is not crucial, but comparing coefficients of variation of traits can be very 
informative. 

It was initially proposed by Gerhardt (1991) and largely confirmed thereafter that spectral and fine temporal 
call traits (e.g., pulse rate) are typically more static call properties in frogs, whereas gross temporal traits were 
suggested to be typically more dynamic. We reviewed 52 original studies (many of which were included in 
Reinhold 2009) including 48 species that used Gerhardfs (1991) concept for the assessment of within-individual 
call variation of frogs (Table 4). Although this review does not raise the claim of being a complete cover of the 
large literature body (we searched for terms such as “coefficient of variation in frogs” in Google and Google 
Scholar), we think that most of the relevant studies were included, and thus reflect the status quo of research on this 
topic quite well. The results of this review confirmed the initial proposal, as we found that in most of the studied 
species dominant frequency was classified as a static trait (69% of 48 studied species; classified as a dynamic trait 
in only three species; Table 4). Moreover, the temporal traits pulse rate (27%) and call duration (21 %) were also 
sometimes described as static (having a relative low variation). This might be due to stabilizing selection (females 
preferring trait values that are close to the population's mean) or to morphological constraints (e.g, body size). 
Thus, individual variation in these traits is expected to be low and a large portion of between-individual variation is 
explained by variation in body size (e.g., Gerhardt & Huber 2002; Rodriguez et al. 2015b; but see next paragraphs 
for exceptions, such as change of spectral traits in relation to the social context and motivational state). Our 
literature review further confirmed that gross temporal traits were classified as ‘intermediate’ or ‘dynamic’ in most 
cases (e.g., call or note duration in 69% of reviewed studies; Table 4). These findings are in concordance with 
many behavioral studies that found plasticity in temporal call traits in different social contexts. 

Most of the studies reviewed (Table 4) used a set of calls that typically comprised 3-70 calls per individual (15 
in average; data not shown), with the exceptions of Friedl & Klump (2002), Larson (2004), Castellano & Rosso 
(2006), Rosso eta/. (2006) and Reichert (2013a), who analyzed more than 250 calls per individual. Thus, the actual 
call variation during sustained calling through a defined period might be underestimated (see below), and our 
knowledge on the plasticity of vocalizations is still quite limited (Dyson et al. 2013). 


Social context and acoustic environment 

Because females tend to prefer males investing a high calling effort, many frogs increase calling rate (while 
simultaneously decreasing call duration), call duration (while simultaneously decreasing call rate), or call 
complexity (by adding notes or changing the number or the relative positions of different notes that compose the 
call) in choruses or in the presence of a competitor (e.g., Dendropsophus ebraccatus: Wells & Schwartz 1984; 
Dendropsophus microcephalus: Schwartz 1986; Schwartz etal. 1995; Dendropsophus minutus: Haddad & Cardoso 
1992; Morals et al. 2012; Toledo et al. 2015a; Dryophytes versicolor: Wells & Taigen 1986; Schwartz et al. 2001, 
2002; Rana dalmatina: Lesbarreres & Lode 2002). Further, some species that usually emit single notes might add 
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notes in chorus situations (e.g., Lithobates clamitans: Bee & Perril 1996), increase the rate of emission of 
aggressive calls (Acris crepitans: Wagner 1989a), or emit calls differing in structure or complexity 
{Dendropsophus ebraccatus, D. microcephalus andO. phlebodes. Wells 1988; Lithobates septentrionalis: Bevier et 
al. 2004; for reviews see Wells 2007; Dyson et al. 2013). 

An example of the plasticity of a purportedly static (spectral) trait under the influence of social context is 
reported in the white-lipped frog, Leptodactylus albilabris, where males shifted the dominant frequency of their 
calls over a mean range of about 100 Hz (and, in one case, about as much as 400 Hz) towards the frequency of 
playbacks of other males (Lopez et al. 1988). Active alteration of dominant frequency in accordance to different 
social contexts was also shown by Wagner (1989b, 1992) in the cricket frog {Acris crepitans). Similarly, male 
green frogs {Lithobates clamitans) can lower the dominant frequency of their calls in response to broadcasts of 
conspecific calls (Bee & Perrill 1996; Bee et al. 2000), and males of Anaxyrus americanus emitted calls with lower 
frequencies when their calls overlapped with calls of other males (Howard & Young 1998). Finally, Reichert & 
Gerhardt (2013) showed that Dryophytes versicolor males decreased the frequencies of their aggressive calls in 
socially escalated situations. 

Spontaneous changes in vocalizations which prevent masking interference between sound signals have been 
well documented in birds (Brumm & Slabbekoorn 2005; reviewed in Brumm 2013), and recent studies indicate that 
some frogs vary call frequency, avoiding overlap with the spectral components of syntopically calling conspecific 
or heterospecific anurans, or with background noise {e.g., Lopez et al. 1988; Parris et al. 2009; Jansen et al. 2016a; 
reviewed in Schwartz & Bee 2013). For example. Both & Grant (2012) found that Hypsiboas albomarginatus are 
able to shift their calls to higher frequencies (from an average ca. 2050-2150 Hz dominant frequency) in response 
to calls of invasive Lithobates catesbeianus. Green Frogs {Lithobates clamitans) and Northern Leopard Frogs 
{Lithobates pipiens) significantly increased dominant frequency of their calls in response to traffic noise {L. 
clamitans'. from ca. 480 to 860 Hz in average; L. pipiens: from 850 to 1200 Hz at average; Cunnington & Fahrig 
2010). Penna et al. (2005) showed that Eupsophus calcaratus increased call duration and call rate in response to 
abiotic background noise (wind, rain, creek and sea surf), and they suggested these vocal responses are adaptations 
that allow frogs to cope with high interference with sounds produced by the local acoustic environment. 

Fine-tuning of calls in response to microhabitat conditions might also be common. For example, Lardner & 
Lakim (2002) showed in a simulated tree-hole experiment that Bornean tree-hole frogs {Metaphrynella sundana) 
are able to adjust the dominant frequency of their calls to the resonant frequency of the hole where they were 
calling from. Ziegler et al. (2011) revealed a strong effect of habitat structure on temporal call parameters of 
Hypsiboas pulchellus, and found an effect of site temperature conditioning the body size of calling males at each 
site, thus indirectly affecting dominant frequency. Males of Hypsiboas prasinus generally call around ponds and 
lakes. However, during cold nights they frequently call from inside the water (Fig. 2) and under such circumstances 
notes showed lower dominant frequencies, longer durations and longer intervals between notes (Delgado & 
Haddad 2015). 


Temperature 

Temperature affects the rate of metabolic reactions in animals and, due to their ectothermic nature, many aspects of 
amphibian physiology are closely linked to the environmental temperature. Calling is an energetically expensive 
activity involving muscular contractions and hence is strongly dependent on operational temperature. In many 
species, environmental temperature regulates the vocal activity period (Wells 2007; Steelman & Dorcas 2010) as 
well as characteristics of the acoustic signals emitted. Temperature effects are more evident in those temporal 
features directly linked to muscular contractions like call rate, pulse rate and call duration, while it tends to be 
subtle or inexistent on spectral traits (Gayou 1984; Gerhardt 1994a; Prohl et al. 2007; Gasser et al. 2009; Lemmon 
2009; Bee et al. 2013a, b; Ziegler et al. 2015). 

Usually, researchers evaluate the strength of the association between environmental temperature at the time of 
recording and acoustic features by means of tests of correlation or linear regression. Unfortunately, and probably 
due to space limitations, neither all the details of these tests nor the original underlying data are typically published, 
and this complicates a correct meta-analysis of the general effects of temperature on the different call features. In 
order to ascertain the extent to which the values of acoustic parameters are affected by temperature, we screened 
papers on temperature-dependent acoustic variation in anurans for those that provide original values of slope. 
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intercept, and temperature ranges for all statistically significant linear regressions. We subsequently used these 
values to calculate the temperature coefficient (Q^^) for each call feature in each species value. The Qj„ value 
represents the amount of change registered in a given feature that will derive from a 10 °C increase in temperature. 
This coetficient allows a direct comparison across species and has been used in several studies of the thermal 
dependency in amphibians (Navas 1996a, b; Navas & Bevier 2001). Obviously, the temperature range over which 
measurements are taken does not need to he exactly 10 °C and the Qjg can be calculated with the following 
formula: 


010 = 


Fi 

kFi) 


10 

(T2-ri) 


Where Fj and are the maximum and minimum values reported in the slope of a given call feature and T, and 
Tj are the corresponding temperature values. We focused our across-species comparisons on the four most 
commonly reported call traits: call rate, call duration, pulse rate and dominant frequency. We obtained adequate 
regression estimates allowing the calculation of Q,g values for these call traits in 20 different species (Table 5) and 
estimated the average and standard deviation of the values of each call property taking the values of each 
species as data points. The results (Fig. 11; Table 5) show that call rate is the most affected trait showing on average 
a two-fold increment in values as the temperature increases by 10 °C (mean ± SD; number of studies: Qi„ = 2.03 ± 
0.39; N = 10). Pulse rate was the next most temperature-affected trait, also showing a nearly two-fold average 
increment with each 10 °C temperature increase (5/„= 1.71 ± 0.23, N = 8). These two traits are likely dependent on 
muscular contraction in the studied species, which becomes more efficient at higher temperatures, resulting in 
shorter time intervals between actively-produced sound units. However, our evaluation in these two traits may 
suffer from the different terminology used in published analyses, as the terms call and pulse may both correspond 
to the term notes under different definitions. Call duration was negatively affected by temperature, with values 
lowering by almost 40% as the temperature increases 10 °C (0;o= 0.63 ± 0.21, N = 12). Dominant frequency 
showed close to one (^;o= 1.16 ± 0.09, N = 8) indicating a very weak temperature dependence of this call trait. 


Variation through time: body condition and fatigue 

Extrinsic (e.g., climate, social context) and intrinsic (e g., metabolism rate, energy reserves, body condition) factors 
can affect within-individual hourly, daily or seasonal variation in calling or calling performance (e.g.. Wells & 
Taigen 1986; Runkle et al. 1994; Schwartz et al. 1995, 2002; Docherty et al. 2000; Brepson et al. 2013; Humfeld 
2013; Ziegler et al. 2015). Castellano & Gamba (2011) studied nightly within-individual variation on call 
properties of Flyla intermedia and showed that, although pulse rate and call duration had been previously described 
as static traits (Castellano & Rosso 2006), they strongly varied during time elapsed in sustained calling, 
independent of environmental temperature variation. The authors hypothesized that this might be due to different 
strategies to avoid vocal fatigue, a phenomenon that might be widespread among species with prolongued vocal 
activity during mating (Humfeld 2013; Pitcher et al. 2014). Similarly, Jansen et al. (2016b) reported temperature- 
independent intra-individual variation during sustained calling of one male Leptodactylus mystacinus of about 12% 
of the mean dominant frequency (difference between minimum and maximum dominant frequency measured 
during the night / mean dominant frequency = 258 Hz/2136 Hz) and 90% variation in call duration (39 ms/43 ms). 
Significant variation in call traits during time was as well observed in Leptodactylus syphax. Within two hours of 
sustained calling, dominant frequency varied by 36% (576 Hz/1587 Hz; Fig. 12A), and call duration by even 60% 
(40 ms/67 ms; Fig. 12B). The causes of variability in call traits during time are still unresolved, but we can 
speculate that this plasticity might be linked to trade-offs between quality and quantity of calling performance, and 
energy limitations (Castellano & Gamba 2011; Humfeld 2013; Jansen et al. 2016b). 

Although not studied in detail, differences in air humidity may affect body condition and result in call 
differences (P.J.R. Kok., pers. obs., see also Kok et al. 2013). Where precision is required, we recommend 
measuring this variable using a hygrometer. 


32 ■ Zoomxa 4251 (1) © 2017 Magnolia Press 


KOHLER ETAL. 







lO 

cvi _ 


o 

cvi 


o 



o 


in 

o 



—I-1-1-r“ 

CD DF PR CR 

Call trait 


FIGURE 11. Box plots of values reported in the literature for four call traits in 20 different species of amphibians. The red 
line indicates no temperature effects (Qj„ = 1). CD, call duration; DF, dominant frequency; PR, pulse rate; CR, call rate. 


Variation in call traits within and among breeding seasons 

Studies over longer periods of time are necessary in order to understand the repeatability of call traits and the ratio 
of inter-individual trait variation versus total trait variation (e.g., Howard & Young 1998). Repeatability is a 
phenotypic measure that estimates the upper limit of the heritability of a trait, but may also be useful to describe 
stereotypy of behavior (Boake 1989). Studies of call variation using statistical analyses on repeated recordings of 
the same individual between nights or seasons revealed ambiguous results (Sullivan 1982; Sullivan & Hinshaw 
1990, 1992; Gerhardt 1991; Runkle et al. 1994; Wagner & Sullivan 1995; Gerhardt et al. 1996; Howard & Young 
1998; Docherty et al. 2000; Bee & Gerhardt 2001; Humfeld 2013; overviews in Tarano 2001 and Reichert 2013b). 
For example, in Anaxyrus woodhousei there was a 7% change in dominant frequency, a 13% change in pulse rate 
and a 24% change in call duration over the course of a breeding season (Sullivan 1982). Runkle et al. (1994) 
revealed that calls of individual Dryophytes versicolor differed significantly between nights concerning calling rate 
and number of pulses per call. In contrast, many studies actually found high repeatability within or between 
seasons, at least for some of the studied call traits. For example, Docherty et al. (2000) demonstrated in a 
laboratory study that Hyperolius marmoratus were very consistent in call rate during a period of 21 days, and they 
assumed that call rate is a determinant of mating success. Howard & Young (1998) observed variation In call 
duration of Anaxyrus americanus between breeding seasons, but not in dominant frequency. Gambale et al. (2014) 
found no significant seasonal effects in Scinax constrictus advertisement calls. Smith & Hunter (2005) found 
moderate values of repeatability of dominant frequency in Litoria booroolongensis between years, but high 
repeatabilities for some temporal call traits (note duration, note rate, pulse number), leading them to the suggestion 
that these traits most likely have a heritable basis. 
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Dendrobatidae _ 

Oophagapumilio spectral: DF Temporal: CD Prohl (2003) 

Oophagapumilio spectral: DF temporal: PR. CR. DC. CD temporal: PN Meuche et al. (2013) 

. continued on the next page 



































Dicroglossidae _ 

Fejervarya Umnocharis spectral: DF temporal: CD, Cl, PN, PR Marquez & Eekhout (2006) 
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Leptodactylidae _ 

Engystomopspustiilosiis spectral: initial and final frequency; temporal: whine shape, RT temporal: rise shape Prohl et al. (2006) 

temporal: CD, FT 

. continued on the next page 


































TABLE 4. (Continued) 

Species Static Intermediate Dynamic Authors 

Physalaemus cuvieri spectral: MINF; SPL; temporal: CD spectral: DF. MAXF. BW Gambale & Bastos (2014) 
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Raorchestes graminirupes spectral: DF temporal: CD. CRT. call and pulse fall time. Bee ef a/. (2013b) 

PN. PR, PP, PD, PRT, pulse 50% rise time, 
pulse 50% fall time 

































TABLE 5. Temperature coeffrcierrts {QIO) of four call features of anurans. For each species, the values were calculated 
from the regression equations reported in the references listed. Acoustic properties were abbreviated as follows. CD: call 
duration, CR: call rate; PR: pulse rate; DF: dominant frequency. 


Species 

CD 

CR 

PR 

DF 

Reference 

Acris crepitans 

0.58 

2.26 

1.33 

1.05 

Wagner (1989a) 

Alytes cisternasu 

0.67 




Marquez & Bosch (1995) 

Alytes obstetricans 

0.60 




Marquez & Bosch (1995) 

Anaxyrus fowleri 

0.55 


1.95 


Zweifel (1968) 

Bombina variegata 

0.61 

2.03 


1.26 

Zweifel(1959) 

Bufotes viridis 

0.50 


2.00 


Castellano et al. (1998) 

Dendropsophus labialis 


2.4 

1.54 


Navas (1996b); 

Luddecke & Sanchez (2002) 

Eleutherodactylus auriculatiis 

0.80 

2.25 


1.19 

Rodriguez (2010) 

Eleiitherodactylus coqiii 


1.63 


1.20 

Benevides & Mautz (2014) 

Eleutherodactylus glamyrus 


1.63 



De laNuez (2007) 

Hyla arborea 

0.56 

1.82 



Friedl & Klump (2002) 

Dryophytes versicolor 

1.20 


1.82 

1.05 

Gayou (1984) 

Dryophytes wrightorum 

0.54 


1.56 


Gergus et al. (2004) 

Hyloxalus subpunctatus 


2.04 



Navas (1996b) 

Leptodactylus fusciis 

0.65 

2.74 


1.29 

Heyer & Reid (2003) 

Pleurodema thaul 




1.09 

Penna & Veloso (1990) 

Pseudacris crucifer 


1.49 


1.11 

Sullivan & Hinshaw (1990); 

Zimmitti (1999) 

Pseudacris triseriata 

0.32 




Platz & Forester (1988) 

Pelophylax lessonae (bergeri) 



1.82 


Schneider & Sinsch (2007) 

Pelophylax lessonae 



1.65 


Schneider & Sinsch (2007) 

Average 

0.63 

2.03 

1.71 

1.16 



Nevertheless, during a single season call parameters can vary due to changes in body condition (Flumfeld 
2013). Similarly, the perception of sound signals through a frog’s auditory midbrain may vary along the same 
reproductive season, possibly influenced by hormones. Goense & Feng (2005) found seasonal changes in 
frequency tuning and temporal processing in single neurons, leading to a frequency tuning shift from sensitiveness 
to intermediate frequencies (700-1200 FIz) in winter, to low frequencies (100-600 Flz) in summer in Lithobates 
pipiens. 


Variation among individuals of the same population 

Although species-specific in their general structure, anuran advertisement calls exhibit not only considerable 
within-individual variation, but also between-individual variation in many traits. A general pattern found in various 
studies is the direct relationship between within-individual and between-individual variation (e.g., Gerhardt 1991; 
Howard & Young 1998; Bee & Gerhardt 2001; Bee et al. 2001; see Table 4). Traits that are more static on the 
within-individual level (based on low CV values) often have low between-individual CVs, and within-individual 
dynamic call traits are often also more variable among calls of different male individuals. However, this does not 
necessarily mean that differences between males are higher in dynamic traits, or that traits on the static end of the 
static-dynamic continuum have per se a low between-individual variation (Gerhardt 2012). In fact, some traits 
usually considered to be static, such as dominant frequency, are strongly dependent on body size and thus highly 
variable among individuals. Such variation between individuals probably serves sexual selection (Gerhardt 1991; 
see below) and plays a role in male-male interactions. 
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FIGURE 12. Nightly variation of dominant frequency (A) and call duration (B) in one individual of Leptodactylus syphax. One 
nocturnal activity phase of ca. 2 hrs of calling (n = 8,106 calls; 29.9 to 31.8 °C). Recording was obtained on 17 November 2014 
at the Research Station ‘Chiquitos’, Bolivia, with a Song Meter SM2 (Wildlife Acoustics; sampling frequency 22,050 Hz; 16- 
bit resolution), and afterwards analyzed with software Raven Pro, version 1.4 (Bioacoustics Research Program 2011) using 
implemented amplitude detectors; statistics were done with R; only calls with high amplitude were considered {i.e., less intense 
‘introductory calls’ of a series were excluded). Red lines show smoothed data (Local Polynomial Regression Fitting with 
span=0.05; M. Jansen & A. Masurowa, unpubl. data). 


Body size and individual recognition 

Body size effects are among the best-studied determinants of call trait variation between individuals (Rodriguez et 
al. 2015b). Body size is usually strongly correlated with spectral traits, and this correlation also holds between 
species, suggesting that fundamental and dominant frequencies are under morphological constraints, with smaller 
frogs (with shorter vocal cords) producing calls at higher frequencies (Gerhardt & Huber 2002; Gingras et al. 
2013). This correlation is almost universal in anurans, with few exceptions (Sullivan 1984; Sullivan & Malmos 
1994; Lingnau & Bastos 2007), including frogs that can actively adjust frequency depending on the context (see 
above). 

Whereas body size effects on frequency might characterize the vast majority of frog species, temporal traits 
have only rarely been suggested to be influenced by body size (Bee & Gerhardt 2001: duty cycle; Castellano et al. 
2002b: intercall duration; Prohl 2003 and Gasser et al. 2009: call rate; Toledo & Haddad 2009: call duration; 
Rodriguez et al. 2010a: rise time; Gambale et al. 2014: pulse number and note duration; Bee et al. 2013a, b: pulse 
rate and pulse rise time). 

For between-individual variation of spectral traits it is appealing to hypothesize ‘honest signalling’, in which 
signals transfer reliable information on male quality from sender to receiver. Given the physical body size 
constraint of frequency in anuran calls, females and competitors can interpret frequency traits as an honest signal 
informing about the body size of the sender and thus, possibly about its strength and quality (Davies & Halliday 
1978; Wells 2007). As an extended discussion of honest signalling is beyond the scope of this paper, we refer 
readers to existing reviews (e.g., Maynard Smith & Harper 2003; Searcy & Nowicki 2005; Greenfield 2006; 
Davies et al. 2012; and Bee et al. 2000 for an example in frogs). 
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In a more general way, between-individual differences in call traits might encode individual identity (Gerhardt 
1991). Although some studies provided statistical evidence for the individual distinctiveness of frog calls 
(‘individual signature’: Shy 1985; Bee et al. 2001, 2010; Gasser et al. 2009; Feng et al. 2009b; Pettitt et al. 2013), 
only a few tested individual recognition experimentally. The so called ‘neighbor stranger discrimination’ (NSD) or 
‘dear enemy-effect’ (Table 6) postulates that territorial males exhibit lower aggression levels towards established 
neighbors than towards unknown intruders. By habituation to familiar calls, males can avoid repeated and 
energetically costly territorial fights with familiar neighbors that already have their own territory with established 
borders. Until now, there are five anuran species where NSD has been demonstrated (Lithobates catesbeianus: 
Davis 1987; Bee & Gerhardt 2001, 2002; Bee 2004a; Lithobates clamitans: Owen & Perrill 1998; Bee et al 2001; 
Rana dalmatina: Lesbarreres & Lode 2002; Odorrana tormota: Feng et al. 2009a; and Anomaloglossus beebei: 
Bourne et al. 2001; Pettitt et al. 2013; Table 6), and one species for which NSD was experimentally rejected 
(Oophagapumilio: Bee 2003; Gardner & Graves 2005). 

TABLE 6. Experimental studies that investigated acoustically mediated social recognition in anurans (‘neighbor stranger 
discrimination’, NSD, or ‘dear enemy-phenomenon’). ‘Yes’ means that NSD was found, ‘No’ means that no 
experimental evidence supports NSD. ‘Call trait’ indicates the call properties that statistically contributed most to 
individual destinctiveness found in the respective studies. Fhl = fundamental frequency; DF = dominant frequency, PD = 
pulse duration, PR = pulse rate, and PI = pulse interval. 


Species 

NSD 

Call trait 

Reference 

Ranidae 

Lithobates catesbeianus 

Yes 

DF, Fh, 

Davis (1987); Bee & Gerhardt (2001, 2002); Bee (2004a) 

Lithobates clamitans 

Yes 

DF, Fhj 

Owen & Perrill (1998); Bee et al. (2001) 

Rana dalmatina 

Yes 


Lesbarreres & Lode (2002) 

Odorrana tormota 

Yes 

Fh, 

Feng et al. (2009a) 

Aromobatidae 

Anomaloglossus beebei 

Yes 

DF, PD, PR, PI 

Bourne et al. (2001); Pettitt et al. (2013) 

Dendrobatidae 

Oophaga pumilio 

No 


Bee (2003); Gardner & Graves (2005) 


Physical and physiological handicaps 

Although poorly studied so far, differences in calls of conspecifics within one population could be due to the 
existence of physically or physiologically impaired individuals. These could originate from injuries or 
malformations of sound-generating anatomical structures {e.g., larynx, lungs, vocal cords, vocal sac), from 
infestation of these structures by parasites (see Prohl et al. 2013), from fungal or viral infections, bacterial 
inflammations, from pathological alterations of the testes, or from deficiencies of the hormonal system. These 
circumstances could potentially produce aberrations from normal species-specific calls. Furthermore, hormonal 
pollution of the environment can affect acoustic characters in frog calls (Hoffmann & Kloas 2012). 


Variation among (geographically separate) conspeci/ic populations 

The main function of communication signals —separating conspecifics from heterospecifics —should account for 
a constraint in the variation of intraspecific signals. However, despite this potential constraint, signals used in 
species recognition vary geographically. Wilczynski & Ryan (1999) suggested that acoustic signals “need not be 
subject to strong stabilizing selection operating on the species level” as it had been proposed by Dobzhansky 
(1937) and Paterson (1982). The amount of such variation has, however, rarely been quantified (e.g., Forti et al. 
2016). Littlejohn (1965) claimed that differences in call traits between geographically separated populations may 
equal those between species, which however is not surprising considering that many allopatric species have very 
similar calls. 
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Divergence in communication signals between populations of the same species or lineage can result from 
various factors, such as (1) genetic drift (isolation-by-distance hypothesis), (2) natural selection, as adaptations to 
different habitats and environmental conditions, and (3) sexual selection, reinforcement and reproductive character 
displacement. In the following sections, we will address some of these influences on geographic call variation in 
more detail. 


Geography and genetics 

Geographic variation in frog advertisement calls has been reported in several species (for reviews see Wilczynski 
& Ryan 1999; Velasquez 2014; an overview of quantitative and qualitative trait divergences identified in 32 frog 
species is given in Table 7). Some studies found evidence of clinal variation in calls, either resulting from 
altitudinal differences between populations (Narins & Smith 1986; Luddecke & Sanchez 2002; O’Neill & Beard 
2011; Narins & Meenderink 2014; Baraquet et al. 2015), or from latitudinal and/or longitudinal differences (Smith 
et al. 2003a; Bernal et al. 2005; Prohl et al. 2007; Magrini et al. 2010; Faria et al. 2013; Baraquet et al. 2015; Forti 
et al. 2016). Additionally, geographic barrier effects on populational differences in frog calls have been recorded 
(Simoes et al. 2008; Magrini et al. 2010; Kaefer et al. 2012). Pleiotropic effects of body size may also contribute to 
call trait differences between different populations: populations might differ in body size and, as a consequence, in 
acoustic traits (Nevo & Capranica 1985; Narins & Smith 1986; Castellano et al. 2000; Smith et al. 2003b; 
Marquez-Garcia et al. 2009; O’Neill & Beard 2011; Micancin & Wiley 2014; Narins & Meenderink 2014; 
Baraquet et al. 2015). Although rarely tested, geographic variation in predation may also play an important role in 
the evolution and maintenance of mating signal divergence (Trillo et al. 2012). Castellano et al. (2000) found some 
evidence for different patterns of geographic variation in static vs. dynamic properties in the Bufotes viridis 
complex. Differences between distant populations were higher in presumably more static properties (pulse rate and 
fundamental frequency) than in dynamic properties. 

Geographic effects might also depend on the phylogeographic structure of the species. If numerous genetically 
divergent units are present (as a product of vicariant differentiation in rather deep time), the boundaries between 
these units can often be hypothesized to be concordant with call differences. On the other hand, species distributed 
over vast distances, without obvious phylogeographic breaks and with continuous gene flow, can be predicted to be 
bioacoustically uniform, or to have bioacoustical differences correlated with geographic distances between 
recorded individuals (‘isolation-by-distance’ sensu Slatkin 1993). Only few studies have attempted to thoroughly 
test these predictions. Research on genetic variation and genetic divergence of frog populations including 
bioacoustical analyses has led to somewhat ambiguous results, possibly reflecting the differences in 
phylogeographic structure among the studied species (Ryan et al. 1996; Wycherley et al. 2002a, b; Lougheed et al. 
2006; Klymus et al. 2010; Twomey et al. 2015; for reviews see Wilczynski & Ryan 1999; Rodriguez-Tejeda et al. 
2014; Velasquez 2014; Rodriguez et al. 2015a; Forti et al. 2016). In Rheobates palmatus, a correlation between 
geographic, genetic and bioacoustical distances among populations has been found (Bernal et al. 2005) suggesting, 
to some degree, an isolation-by-distance mechanism. Amezquita et al. (2009) reported on genetic and bioacoustical 
variation in correlation with geographic distances in Allobates femoralis. In contrast. Funk et al. (2009) found 
correlations of genetic with call differences among populations in two Engystomops species, but no correlations 
with geographic distance. Similar results were reported in Velasquez et al. (2013) for the Chilean frog Pleurodema 
thaul. Forti et al. (2016) did not find any correlation between genetic and acoustical distances in Brazilian 
Proceratophrys moratoi. Jang et al. (2011) suggested a combination of a barrier model and an isolation-by-distance 
model to explain the genetic and call variation in Dryophytes japonicm. 

Other studies revealed more complex relationships between genetic and acoustical divergence among 
allopatric populations. For example, in the tungara frog, Prohl et al. (2006) found that differences in calls along a 
550-km transect of 25 populations were explained better by geographic distance than by genetic distance. In a 
similar study along a transect from northern Costa Rica to western Panama on Oophaga pumilio (Prohl et al. 2007), 
the correlation between bioacoustical and genetic distance disappeared after call data were controlled for 
geographic distance. Cases of high genetic variation and low bioacoustical divergence between frog populations 
have also been reported (Fleyer & Reid 2003: Leptodactylus fuscus). In general, the reported geographic variation 
in frog calls relate to differences in quantitative traits (such as dominant frequency, call or pulse rate, or note 
duration), but not to changes in general call structure. 
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Adaptations to different environments 


In addition to omnipresent spherical attenuation, complex environments promote so called excess sound 
attenuation (Richards & Wiley 1980). Plant coverage or the choice of microhabitat or calling site have impacts on 
the production, transmission, refraction and reflection of acoustic signals (for reviews see Gerhardt & Huber 2002; 
Erdtmann & Lima 2013), potentially resulting in pronounced changes in their temporal and spectral properties 
(Forrest 1994). Temporal and spectral traits are differently affected by environmental features as calls propagate. 
On the one hand, higher frequencies are more attenuated in any environment than lower frequencies and thus high 
frequencies do travel shorter distances than lower frequency sounds (Kime et al. 2000). On the other hand, the 
temporal structure of calls may be distorted by echoes (Forrest 1994). 

Although many aspects of anuran communication have been studied intensively, the adaptation of signaling 
behavior in different environments (‘acoustic adaptation hypothesis’) has only rarely been examined—especially at 
the intraspecific level—and the existing studies had ambiguous outcomes (Wells & Schwartz 1982; Zimmermann 
1983; Penna & Solis 1998; Kime et al. 2000; Bosch & De la Riva 2004; Boeckle et al. 2009; Ziegler et al. 2011; 
Penna et al. 2013; Rohr & Junca 2013; Vargas-Salinas & Amezquita 2014; for reviews see Ey & Fischer 2009; 
Erdtmann & Lima 2013; Villanueva-Rivera 2014), and there is little evidence for different environments 
influencing signal structure in frogs. One example at the inter-specific level comes from the North American 
cricket frog {Acris crepitans complex). The two species, A. blanchardi, mainly occurring in grasslands, and A. 
crepitans, inhabiting pine forests, differ significantly in the dominant frequency of their calls (ca. 3000 Hz in A. 
blanchardi', 4200 Hz in A. crepitans', ca. 930 km east-west distance), call rate and call group duration. Although 
general call structure remains the same, grassland populations are much more diverse (in multivariate space) in 
overall call structure than are forest populations (Ryan et al. 1990; Ryan & Wilczynski 1991; Wilczynski & Ryan 
1999). 

In an among-species comparison, Rohr et al. (2016) have recently shown that stream-breeding frog species call 
at higher frequencies, a tendency also supported by the occurrence of ultrasound communication in torrent species 
{e.g., Feng et al. 2006; Arch et al. 2008). Although not studied at the intraspecific level, it can be expected that the 
low-frequency background noise produced by fast-flowing streams might exert a selective pressure on populations 
to increase the frequency of their calls (Boeckle et al. 2009). 


Character displacement 

The coexistence with heterospecifics may also influence signal variation in frogs. In areas where congeneric 
species occur in both allopatry and sympatry {e.g., across secondary contact zones) character displacement may be 
an outcome (Dobzhansky 1940; Brown & Wilson 1956; Grant 1972; Higgle et al. 2000; Panhuis et al. 2001). On 
the one hand, ecological character displacement (ECD) is a regular phenomenon in sympatric congeners and is 
usually attributed to selection caused by competition for limited ecological resources (Slatkin 1980; Howard 1993; 
Schluter 2000; Pfennig & Pfennig 2010). Reproductive character displacement (RCD), on the other hand, stems 
from selection against heterospecific matings (reviewed in Cooley 2007; Jang 2008; Pfennig & Pfennig 2009, 
2010; Gerhardt 2013). RCD, when mediated by diversification of advertisement calls, as an adaptive process, can 
arise from reinforcement mechanisms driven by selection against hybrids, resulting in pre-zygotic reproductive 
isolation (Dobzhansky 1937, 1940; Howard 1993; Coyne & Orr 2004). A definition of reproductive character 
displacement was given by Gerhardt & Huber (2002: 384) as “a geographic pattern in which differences in the 
communication systems of two taxa (or incipient taxa) are greater in sympatry than in allopatry because of 
selection in sympatry against costly mating mistakes (= reduced viability or fitness of hybrids, wasted gametes, or 
missed mating opportunities)”. 

Empirical evidence for character displacement in amphibians was found for an array of species (Blair 1955, 
1974; Littlejohn 1965, 1999; Loftus-Hills & Littlejohn 1992; Gerhardt 1994a; Marquez & Bosch 1997; Pfennig 
2000; Hobel & Gerhardt 2003; Pfennig & Pfennig 2005, 2010; Hoskin et al. 2005; Guerra & Ron 2008; Lemmon 
& Lemmon 2010; Richards-Zawacki & Cummings 2010; Rice & Pfennig 2010; Micancin & Wiley 2014; Jansen et 
al. 2016a), but this process may not apply to all genera (Toledo et al. 2015c). RCD in frogs regularly concerns 
acoustical signal traits related to mate finding and choice (e.g., temporal call parameters: Fouquette 1975; Lemmon 
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2009; spectral call parameters: Hobel & Gerhardt 2003); however, other aspects of reproductive biology can also 
be affected (e.g., female preferences: Marquez & Bosch 1997; aggregation behavior of calling males: Pfennig & 
Stewart 2011; use of different types of calling perches: Hobel & Gerhardt 2003). Mechanistically, RCD in male 
courtship signals can be caused by selection against signal interference (Gerhardt & Huher 2002), or it can be an 
adaptation to diverging female preferences (Boul et al. 2007). Pfennig & Pfennig (2009) described how initial 
differences between traits of the competing species facilitate character displacement and how these differences can 
be emphasized in sympatry (see also Fig. 9.8. in Littlejohn 2001). 

Population differences in body size may facilitate displacement in dominant frequency of calls, and thus, the 
actual evolutionary causes of character displacement may be difficult to resolve (differences in body size caused by 
RCD to reduce acoustic interference, vs. character displacement in dominant frequency resulting from ECD 
through selection on body size or mass caused by other reasons; Micancin & Wiley 2014; Jansen et al. 2016a). 


Effects of hybridization 

Deviant calls observed within populations of frogs can be indicative of interspecific hybrids. While such instances 
are rare in nature, effects of possible hybridization need to be taken into account when interpreting call variation. 
Where documented, hybrid calls often have been described to some degree being intermediate between the calls of 
the parental species (Duellman & Trueb 1994; Wells 2007). This has been described, among others, for Dryophytes 
cinereus x D. gratiosus (Mecham 1960; Gerhardt et al. 1980), Dryophytes avivoca x D. chrysoscelis (Gerhardt 
1974), Dryophytes versicolor x D. chrysoscelis (Mable & Bogart 1991), Dryophytes versicolor x Hyla arborea 
(Mable & Bogart 1991), Spea bombifrons x S. hammondii (Forester 1973), Geocrinia laevis x G. victoriana 
(Littlejohn & Watson 1976), Crinia pseudinsignifera x C. subinsignifera (Roberts 2010), and toads of the genus 
Anaxyrus (Blair 1956a, b; Zweifel 1968). In other cases, however, hybrids produce calls more similar to those of 
one of the parental species and sometimes, calls that include unique traits (e.g., in European water frogs of the 
genus Pelophylax; Wycherley et al. 2002b) where call variation also is in line with genome dosage effects in 
triploid hybrids (Hoffmann & Reyer 2013). An additional example documented here is that of the leaf frogs, 
Phyllomedusa distincta and P. tetraploidea, which produce triploid hybrids (Haddad et al. 1994; Gruber et al. 
2013). These frogs have very similar calls but subtle differences exist and the hybrids have intermediate number of 
notes (Fig. 13). 

This example, along with other documented cases, suggests that in the contact zones of some closely related 
species, hybrids can be common and might in some cases explain high between-individual variation encountered in 
the wild. This could be particularly true for explosive breeders with scramble competition mating strategies, such 
as many bufonid toads (e.g., Haddad et al. 1990; Dodd 2013). In these species females have fewer opportunities to 
select males based on their call characteristics, which might explain why hybridization is relatively common and a 
substantial proportion of a population of calling males can potentially consist of hybrids (Malmos et al. 2001). 


Species identification and delimitation by advertisement calls 

The previous chapter has long elaborated on the many instances and causes of intraspecific variation of 
bioacoustical traits in anurans. We have also emphasized that the amount of this intraspecific variation can at 
times—but not commonly—be substantial. This obviously implies that care needs to be taken when interpreting 
bioacoustical differences in a taxonomic context. It should however not distract from the fact that bioacoustical 
characters are extremely reliable and effective in diagnosing and delimiting anuran species. 

An important asymmetry using evidence from vocalizations in taxonomy—as with most other characters in 
integrative taxonomy—is that presence of differences potentially serves as evidence for taxonomic distinctness, 
while absence of differences does not serve as evidence for taxonomic identity (because the distinguishing 
evidence might be found in another character or another line of evidence). This reflects a general epistemological 
problem in integrative taxonomy: refuting a one-species hypothesis and delimiting a new species is 
straightforward, while conclusively refuting a two-species or multi-species hypothesis to lump the respective 
individuals into a single species is sometimes impossible (Miralles & Vences 2013). 
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Phyllomedusa distincta Hybrid (triploid) Phyllomedusa tetraploidea 



Time (s) Time (s) Time (s) 


FIGURE 13. Comparative spectrograms and oscillograms exemplifying the effect of hybridization on the call structure of 
anurans. The two tree frogs Phyllomedusa distincta (diploid) and P. tetraploidea (tetraploid) co-occur and hybridize in Ribeirao 
Branco, south of Sao Paulo state, Brazil, producing triploid hybrids (3n = 39) (Haddad et al. 1994; Gruber et al. 2013). Their 
advertisement calls are of similar structure and indistinguishable to the human ear, but have subtle quantitative differences 
(Student’s t-Test = 11.06; p < 0.0001): 6-11 notes (7.8 ± 1.02; n = 31 calls from 3 males) in P. distincta, 8-17 notes (12.8 ± 2.4; 
n = 58 calls from 8 males) in P. tetraploidea (4n). An intermediate range of 6-16 notes (9.7 ± 1.9; n = 78 calls from 9 males) is 
found in triploid hybrids (3n). Recordings obtained in the hybridization zone (Ribeirao Branco, Sao Paulo, Brazil) using a 
Nagra E tape recorder and a Sennheiser ME80 microphone, at air temperatures varying from 14.5 to 2rC. All recorded 
specimens were karyotyped to confirm their identities. Spectrograms made with the R package Seewave (Sueur et al. 2008a) 
with Hanning window function at 512 bands EFT resolution. 

Although not all anuran species differ from each other in advertisement calls, there is no doubt that, overall, the 
degree of bioacoustical divergence between species exceeds that of within-species call divergence, particularly if 
comparing complete anuran communities rather than closely related species. Surprisingly, this question has rarely 
been explored using thorough statistical methods. This might in part be due to the fact that calls of different anurans 
are so different that defining homologous variables is highly contentious. 

Bioacoustical variation in species-rich frog faunas is already obvious from the fact that body sizes in such 
assemblages often vary over one order of magnitude, from <1 cm to >10 cm, with the respective variation also in 
spectral traits. Figure 14A shows the correlation of dominant frequency of the calls and maximum male body size 
for 155 Madagascan frogs of the family Mantellidae (based mostly on recordings published by Vences et al. 2006) 
illustrating that, similar to body size, the dominant frequency also varies over one order of magnitude (from 700 to 
7800 Hz) across this species assemblage. Variation in average note duration is even more extreme, ranging over 
more than two orders of magnitude (from 10 to 4600 ms). Based on an admittedly limited sample size (5-20 notes 
measured per species), and despite considerable intraspecific variation, it is obvious that for the majority of 
pairwise species comparisons, between-species variation vastly exceeds within-species variation in note duration 
(Fig. 14B). The data also show that extremes of intraspecific variation are skewed towards shorter note duration, 
often representing incomplete notes emitted at the start of calling activity. If we furthermore consider that (1) 
variation is to a large degree uncorrelated in the two dimensions discussed (dominant frequency and note duration), 
and that (2) overall call variation extends over numerous additional uncorrelated dimensions (e.g., inter-note 
intervals, pulsed vs. tonal notes, number of note types), it becomes obvious that for the vast majority of 
comparisons among species in this anuran assemblage, distinct and diagnostic call differences are expected. 

However, due to overlapping call characteristics and within-species variation, exceptions to this general 
pattern are frequent. Especially in allopatrically distributed species complexes, call differences between species are 
often subtle and value ranges of all traits might vastly overlap. In 252 anuran species from Madagascar for which 
call data were available, Vences et al. (2008) estimated that 59 (23%) could not be reliably identified by their calls 
alone. For each of these 59 species there was at least another species with very similar calls. As stated above and 
also noted for other organisms (e.g., Tishechkin 2014), identity of acoustic signals does not provide conclusive 
evidence for taxonomic identity. However, under sympatric conditions (except for narrow contact and hybrid 
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zones) it is highly unlikely that two distinct species will be characterized by identical vocalizations because 
selective pressures would promote bioacoustical divergence (but see Toledo et al. 2015c for discussion on 
sympatric and allopatric selective pressures). Hence, under conditions of sympatry, bioacoustical identity can serve 
as an indication, although not conclusive evidence, for taxonomic identity. In the next sections, we will provide 
guidelines for interpreting these rules in taxonomic practice and subsequently will summarize the value of different 
call traits for the purpose of taxonomy. 


A B 



FIGURE 14. Variation of two call traits within the Madagascar-Comoroan anuran family Mantellidae. (A) Correlation of 
dominant frequency and maximum male snout-vent length in 155 mantellid species. (B) Variation of note duration (mean, 
minimum and maximum values) among 171 species of mantellids, ordered by mean note duration (5-20 measurements per 
species). On Y-axis values are arranged along a logarithmic scale for graphical reasons (but scale shows original values in 
milliseconds, not log-transformed values). Notes defined following a note-centered scheme (cf. Fig. 7). 


Interpretation of call differences in taxonomic practice 

As demonstrated above, calls of frogs may vary greatly according to a multitude of factors. Only part of this call 
variation is due to selection in the context of pre-zygotic isolation, and important call traits—including those 
recommended for taxonomy herein—can differ due to temperature, body size, or other factors between and within 
individuals of the same species (Fig. 15). Hence, the main task, when using bioacoustics in taxonomic approaches, 
is the discrimination of intra-specific from inter-specific call variation. When doing this, we reiterate that the most 
important questions to be asked are: (1) If qualitative differences in calls are observed, do these really refer to the 
same type of call or note in each recording? (2) If quantitative differences are observed, can we exclude that they 
are caused by differences in temperature, body size and body mass, or motivational factors? 


Calling motivation 

Except for obvious differences in temperature or body size, under field conditions it is hardly possible identifying 
the factors that might cause call variation within or between conspecific individuals. However, many of these 
differences can be subsumed under the term calling motivation (Fig. 15B). Nocturnal species often start calling at 
dusk with rather irregular calls, which then become regular later in the evening. Males calling in choruses typically 
will be more motivated than those calling in isolation. As calling motivation may also strongly depend on social 
context, the observer should be aware of the possible presence of different call types, some of which might not be 
useful for taxonomy. Ideally, call comparisons would be more reliable if conducted on calls recorded from 
obviously motivated males, emitting calls at relatively regular intervals. 
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Intra-specific call variation: different call types 
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Intra-specific call variation: differences in motivation 
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FIGURE 15. Spectrograms and oscillograms showing intra-specific call variation in two treefrog species from Madagascar, 
exemplifying the need to account for the possibility of different call / note types and of strong influence of motivation when 
taxonomically interpreting bioacoustical differences. (A) Males of Boophis at Antsatramidola, Madagascar, were emitting two 
very different types of calls, one of which might represent a territorial call. However, we never heard the two calls from the 
same individuals and therefore in the field were convinced of the presence of two morphologically cryptic species. Subsequent 
genetic study revealed that the individuals were all conspecific with B. tampoka and emitting two different call types (Kohler et 
al. 2007; Vences et al. 2011). (B) Boophis ankaratra emits long series of notes. Typical note repetition rate is reflected by the 
call from Manjakatompo, emitted by a male in the presence of several other calling males. At Itremo, during a dry evening, 
only few specimens were sporadically calling and obviously were in a state of low sexual motivation; despite a slightly higher 
temperature, note repetition rate was much lower at this occasion. Spectrograms produced with CoolEdit Pro at Hanning 
window function, 256 bands resolution. 
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FIGURE 16. Spectrograms illustrating qualitative call differences between closely related species (all mantellid frogs from 
Madagascar). All spectrograms show only a section of a longer series of notes. Gephyromantis eiselti and G. thelenae form a 
clade together with a third species (Kaffenberger et al. 2011). While G. eiselti emits series of tonal notes, G. thelenae emits 
much slower series of much longer pulsed notes at similar temperatures. Boophis majori and B. narinsi are sister species 
(Wollenberg et al. 2011) and differ extremely in note duration and note repetition rate (short clicks vs. long pulsatile sounds). In 
both cases, the species in each pair occur in syntopy and are extremely similar to each other in adult morphology. Despite 
distinct qualitative call differences, genetic divergences between each of the two species pairs are remarkably low (p-distances 
2.2—3.3% in a fragment of the mitochondrial 16S rRNA gene; Wollenberg & Harvey 2010; Vences et al. 2012a). In such 
extreme cases of bioacoustical divergence, and if the presence of different call types or recording artifacts can be excluded, 
bioacoustical data provide conclusive evidence for species level divergence. Recordings from Vences et al. (2006, 2012a); 
spectrograms made with the R package Seewave (Sueur et al. 2008a) at Hanning windowing function, 512 bands resolution. 


Qualitative call differences 

As taxonomic comparisons often imply using data from different populations recorded at different conditions, the 
key question is: could the observed acoustic differences represent within-species variation? When call differences 
are truly qualitative, then taxonomic inference is immediate. Very pronounced differences in call structure were 
named qualitative by Vieites et al. (2009), such as presence of different note types, strongly pulsed versus tonal 
calls, call series vs. single calls. If such differences are encountered (and motivational artifacts or different call 
types are reliably excluded; see Fig. 15), then the importance of detailed statistical comparisons of temporal or 
spectral parameters, correction for body size and temperature, and of the quality of recording equipment is 
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secondary. Two examples in Figure 16 show two closely related species pairs of mantellid frogs that differ radically 
(qualitatively) in their advertisement call structure, and Figure 17A includes an even more drastic example for 
South American hylids. Such qualitative differences normally provide clear evidence for specific distinctness, 
although exceptionally might also characterize populations whose species status is not ascertained, such as 
different arrangement of notes in populations of Allobates femoralis (Amezquita et al. 2009) or call structure in 
speciating populations of Engystomops petersi (Boul et al. 2007). Yet, in these and all other cases in which genetic 
data were available, the qualitative differences in calls were accompanied by genetic divergence (even if low), 
whereas differences in morphology were often not obvious (morphologically cryptic species). 


Geographic setting, concordance and comparability 

The interpretation of qualitative differences is particularly straightforward in cases of sympatric occurrence of the 
individuals with strongly divergent calls (Fig. 17A), although even in such cases, it needs to be carefully excluded 
that the distinct calls might represent different call types of the same species (Fig. 15A). But the distinction 
between sympatry and allopatry gains importance when call differences are quantitative only. As a general rule, 
divergent calls in sympatry (i.e., two potentially divergent calls recorded from specimens occurring at the same 
site, or at least from sites very close to each other), indicate specific distinctness more reliably than in allopatry 
(e.g., from two populations of frogs specialized to high elevations and occurring on mountains not connected by 
suitable habitat). 

Besides the geographic setting, an important factor to be considered is the concordance of bioacoustical 
differences with another species criterion (e.g., fixed differences in a genetic marker or in a morphological 
character). Again, such evidence gains importance in sympatry. If a representative group of specimens emits calls 
different from another group of specimens (even with weak quantitative differences such as statistical differences 
between call traits with overlapping ranges) and a second diagnostic difference concordantly distinguishes the two 
groups of specimens, then in a sympatric setting this is an almost fully reliable evidence for the co-occurrence of 
two non-interbreeding lineages (Padial et al. 2010). 

A third important factor to be taken into consideration is the comparability of the available recordings. 
Taxonomic interpretation even of slight quantitative call differences is straightforward, if recordings are fully 
comparable {i.e., recorded in a sympatric setting at the same site and time). Interpretation becomes more difficult, if 
the dataset contains calls (even from sympatric frogs) which were recorded under different conditions {e.g, 
different points of time, different temperatures, etc.). 


Quantitative call differences: sympatry versus allopatry 

When referring to closely related species, in most cases taxonomists have to deal with quantitative differences in 
calls. Closely related species often have a rather similar general structure in their advertisement calls, reflecting 
joint evolutionary history (Goicoechea et al. 2010). Although there is little experimental evidence, we here posit as 
an assumption that quantitative differences without overlap of the parameters measured (Fig. 17B) are stronger 
indication for taxonomic distinctness than slight or moderate differences with overlap {e.g., Kohler et al. 2005b; 
Padial et al. 2008; Fig. 18B). 

In sympatric settings, if slight quantitative differences are detected, it is highly recommended to record as 
many calls and individuals of each respective group as possible (as minimum requirement in such a scenario, we 
recommend 10-20 individuals per species and at least 10 calls per individual), and collect representative call 
vouchers to provide a sufficiently convincing dataset. In sympatry, even such slight call differences can be 
indicative of species-level distinctness, if observed concordantly between two groups of calling individuals, but 
additional lines of evidence are usually necessary to support a taxonomic conclusion. Flowever, cases of sympatry 
with slight quantitative call differences are probably rather rare, and in many cases, might reflect a contact zone 
among predominantly parapatric species, with possible hybridization. 

More common will be situations where sympatric species share a general structural pattern but quantitative 
properties of their calls differ strongly, with little or no overlap in their range values. Such a difference in even a 
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single call trait will potentially be as conclusive as a qualitative difference (Fig. 17B), as long as body size, 
temperature and motivational effects can be excluded. Given that these non-overlapping call parameters often work 
well as pre-zygotic isolation mechanisms (but exceptions occur; e.g., Mayer et al. 2014), genetic divergence 
between these sympatric and related species should be recognizable. 
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FIGURE 17. Interpretation of advertisement call differences: (A) Example showing spectrograms and oscillograms with 
distinct qualitative call differences of two frogs in sympatry (syntopy), providing evidence for species-level divergence, despite 
a comparatively low level of genetic divergence (Kohler et al. 2010). (B) Example showing distinct and constant quantitative 
call differences of two frogs in sympatry (syntopy), providing clear indication of species-level divergence, corroborated by 
high genetic divergence (Vences et al. 2010b). Spectrograms produced with CoolEdit Pro at Hanning window function, 256 
bands resolution. 
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FIGURE 18. Interpretation of advertisement call differences: (A) Spectrograms and oscillograms of calls of two allopatric frog 
species without any significant differences (evidence for species-level divergence by molecular genetics and tadpole 
morphology; Vences et al. 2010a). (B) Moderate structural call differences of two allopatric populations currently assigned to 
the same species. The calls of Blommersia wittei from Sambava and Andrakata are composed of clicking notes of a metallic 
sound, whereas at Nosy Be, Benavony, and Montagne d’Ambre, notes contain pulses and calls exhibit less distinct inter-note 
intervals. Spectrograms produced with CoolEdit Pro at Hanning window function, 256 bands resolution. 

The situation becomes more disputable when referring to advertisement call differences (or similarity) among 
allopatric populations. Related allopatric species do not require calling differently, as they do not need to isolate 
from each other for mating. As exemplified by Vences et al. (201 Oa), calls of a closely related pair of allopatric 
species {Boophis boehmei, B. quasiboehmei) are virtually identical when analyzed in-depth, although the two 
species can be distinguished by pronounced genetic divergence and slight differences in morphology (Fig. 18A). 
ITence, similar or identical calls among allopatric populations do not indicate conspecificity. 
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On the other hand, call differences among allopatric populations do not necessarily indicate specific 
distinctness. As discussed in depth in previous sections, separated populations of a single species may be exposed 
to different environmental factors triggering the modification of behavior and vocalization. Such factors may for 
example involve the sympatric presence of a closely related species, promoting character displacement at one site, 
but not at the other. Given that in most cases the exact circumstances will be unknown, interpretation of allopatric 
call differences remains tricky. As a general but untested assumption, qualitative differences among calls of 
allopatric populations are more likely indicative of specific distinctness than quantitative differences. Concerning 
quantitative call differences in allopatry, which may in reality also constitute variation along a putative dine, 
taxonomic judgement is exceptionally difficult. 

As a general guideline, we strongly recommend, especially when dealing with advertisement call differences 
among allopatric anuran populations, to obtain a representative and dense geographic coverage of samples 
(including zones of contact or in close spatial proximity, if applicable), and to use additional character sets. These 
character sets may reveal differences or not, and spatial distribution of acoustic characters might concord with or 
contradict other character sets. In any case, combined datasets will bring more light to complex situations and, in 
many cases, will either support or prevent taxonomic revision (e.g., Padial et al. 2009; Glaw et al. 2010). 

In conclusion, taxonomic practice and correct interpretation of call differences (or similarities) strongly 
depends on the scenario observed. Whereas a sympatric occurrence of target taxa provides perfect conditions for 
the use of bioacoustics in taxonomy, allopatric scenarios always have to be analyzed with greatest care and 
interpreted with considerable scepticism. If evidence from different character sets remains inconclusive, we 
strongly recommend refraining from taxonomic action. In these cases, reporting the results of intraspecific 
variation is advisable, as they can potentially provide the background for future evolutionary and taxonomy 
studies. 


Usage of statistics in call comparisons 

The application of statistics in bioacoustical comparisons becomes more important when (1) the number of traits 
differing between two groups of individuals decreases, (2) differences are found in traits known to be more 
dynamic and strongly dependent on body size, temperature, and motivation, and (3) lower levels of differentiation 
are observed. It also needs to be taken into account that as sample size decreases, the arsenal of applicable 
statistical tools also decreases. When only a few individuals were recorded per group, statistical hypothesis testing 
might not be reliably applicable. When information from several individuals per group is available, it is important 
to make an informed statistical decision. In such cases, a suitable option would be to perform multiple linear 
regression analysis for each call trait including temperature and body size as independent variables. The residuals 
of those regressions can then safely be considered as not affected by the independent variables, and the regression 
equation might be used to adjust the observed values to a given temperature and/or size. 

Alternatively, analyses of covariance could be used to search for call trait differences among groups of 
individuals (defined as independent canonical variable), and temperature and body size as covariable; however, 
such ANCOVAs might not yield conclusive results in situations in which the different groups of specimens were 
recorded exclusively at different, non-overlapping temperatures. 

It is furthermore paramount to clearly mention the number of individuals included. Often, it is advisable to 
summarize all measurements per individual, and then use these average values as data points in statistical 
comparisons between populations, to avoid pseudoreplication (but mention minimum and maximum values). 
When multiple groups are compared, homogeneity of slopes must be tested before adjusting all data with a single 
regression equation. If slopes differ between groups, separate regression equations should be used for each group. 
Statistical comparisons will always be affected by sampling design and sample size and thus conclusions should be 
drawn with caution. 

An additional point is the increased probability of type I errors when multiple call traits are compared among 
multiple groups. It is important to realize that two different kinds of a priori questions can be assessed by statistics. 
On one hand, taxonomists might be interested in comparing two populations to find one or several characters 
distinguishing them, and then using these characters in an integrative framework as bioacoustical evidence for their 
species-level distinctness. In such analyses, type I errors would be of high impact as they could lead to unjustified 
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recognition of species, and the use of statistical tools to correct for multiple testing is therefore absolutely 
indispensable (or the number of call traits to be compared can be reduced using ordination techniques such as 
Principal Component Analysis). On the other hand, researchers might just want to identity all the call traits by 
which two populations differ, without drawing further conclusions from any of the single tests; in such cases, 
multiple testing without adjustment might be justifiable to avoid an exaggerated lowering of statistical power and 
thus prevent type II errors (see Perneger 1998; Nakagawa 2004). 

Eventually, we need to carefully evaluate whether such statistical differences in advertisement call 
comparisons have taxonomic relevance, for example, with respect to female recognition and phonotaxis (Marquez 
et al. 2008). When comparing advertisement calls of allopatric populations with sophisticated statistical tools, it is 
rather likely that we will detect statistically significant differences in some temporal or spectral parameters, but 
these might not be indicative of species-level divergence (see above). If in sympatric situations two genetically or 
morphologically divergent groups consistently differ also in a bioacoustical variable, even if slightly, this can be a 
strong indicator for taxonomic distinctness. Multivariate statistics of call parameters might be useful to visualize 
these slight but consistent differences detected (see Toledo et al. 2015c). 

A thorough examination of the statistical methods to be applied in species delimitation based on phenotypic 
data alone or in combination with genotypic caracters can be found elsewhere (Wiens & Servedio 2000; Guillot et 
al. 2012; Solis-Lemus et al. 2015) and it is beyond the scope of the present compilation. As a general rule, a choice 
on a given statistical test will usually require some assumptions to be met by the data and deviations from these 
could produce spurious results. Therefore, before embarking in complex statistics, researchers should make an 
adequate assessment of the problem and the data at hand in order to select the most robust statistical techniques 
applicable to their specific situation. 


Useful call traits in taxonomy 

Which bioacoustical traits are most relevant for taxonomy will strongly differ among anuran species. Especially if 
qualitative differences are absent, it is important to undertake informed choices of the quantitative parameters that 
will be compared. As discussed below, we suggest that for taxonomic inference weight should be allocated in 
decreasing order to note / call duration, dominant frequency, pulse rate and note / call rate. 

Our review has indicated that, among temporal variables, the duration of basic uninterrupted call units shows 
comparatively limited intraspecific variation (in some cases being on the static side of the continuum), is only 
moderately influenced by temperature and, probably in general, is not influenced by variation in body size. Call 
duration can vary over two orders of magnitude among species in the same family. This applies to the parameter 
named ‘call duration’ (in a call-centered approach) or ‘note duration’ (in a note-centered approach) (Fig. 7), and we 
flag it as a comparatively valuable taxonomic indicator. As a caveat, it is important to compare equivalent 
(homologous) units among species, apply the same terminology in each comparison (call-centered or note- 
centered), and be aware of possible influences of recording equipment, temperature, and methods used in analyses 
(see below). Always consider that, although regarded as static, substantial variation might also exist in call duration 
in some species, within individuals as well as among populations (average percent change = 37.5; minimum- 
maximum: 3.0-76.0%; Table 7), and that in some cases the values of note or call duration can almost double with 
every 10 °C shift in environmental temperature. 

Dominant frequency is very static, as reported in numerous studies, and, unlike temporal call variables, shows 
average temperature close to 1 (Fig. 11). Furthermore, it is easy and uncontroversial to measure (except if 
several harmonics of similar energy exist; see Box III), it is recorded reliably by a variety of recording devices (see 
below), and it can vary over one order of magnitude among species at the level of anuran families. However, as an 
important caveat, it is strongly dependent on body size. This means that considerable variation can occur within a 
species, among individuals of different size, and differences encountered between populations might be a side 
effect of difterent body sizes of individuals in these populations, rather than indicative of different species. Percent 
change in dominant frequency between populations averaged 21.0% (minimum - maximum: 3.0^4.6%) with 
ranges of variation between populations reaching up to 1320 Hz in Leptodactylus fuscus (Heyer & Reid 2003) and 
1500 Hz in Oophaga pumilio (Prohl et al. 2007) (Table 7). Another caveat is the capacity of males of some species 
to change the dominant frequency during social interactions (e.g., Bee et al. 2000). 
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TABLE 7. Overview of quantitative and qualitative trait divergences identified in studies on geographic call variation in 32 frog species. Here, we only considered two call traits, call duration and 
dominant frequency, because they were used in nearly all of the studies. Some studies mentioned in the text were re-analyzed (Faria et al. 2013) or excluded from this summary because no population- 
level data were presented by the authors (e.g., Kaefer et al. 2012) or because it was not possible to eliminate data from heterospecific populations (e.g., Lougheed et al. 2006; Simoes et al. 2008; 
Amezquita et al. 2009; Shen et al. 2015). Abbreviations: y = parameter found to be (most) responsible for geographic variation; n = no statistical significance for geographic variation found; CD = call 
duration; DF = dominant frequency; CR = call rate; ND = note duration; NN = number of notes per call; NR = note repetition rate. n.d.= no data, e.g. data were analyzed in a multivariate analysis and 
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Engystomops y n CD; fall Initial frequency of the Initial frequency Prohl et al. (2006) 

pusulosus time; fall whine: ca. 1000 to 1100 of the whine: 

shape Hz 9.1% 

___ CD: ca. 190 to 280 Hz CD: 32.1% __ 
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Pulse repetition rate (= pulse rate) is a third variable that has often been highlighted as being comparatively 
static within species and there is evidence that suggest pulse rate being anatomically constrained (Gerhardt 2001). 
Pulse rate is an important trait for taxonomy and is considered to represent the most important call property in mate 
recognition in some species (e.g., Littlejohn 1971; Gerhardt 1994a; Gergus etal 1997). In some cases, however, it 
might be more dependent on temperature than other temporal call traits, such as call / note duration. In general, we 
hypothesize that (1) temperature affects the duration of silent intervals between sound units (if pulses are separated 
by silence), but (2) pulse rate is far less affected by temperature effects than call repetition rate (equals note 
repetition rate in a note-centered terminology). Furthermore, temperature-dependent changes of pulse rate probably 
do not strongly affect its value in mate recognition as the perception of pulse rate changes accordingly with 
temperature, at least in some species (Gerhardt 1978; Brenowitz et al. 1985). Still, possible temperature effects 
need to be taken into account when using pulse rate as a taxonomic character. 

Moreover, if a certain entity (call / note) of a frog vocalization contains pulses, the number of pulses per entity 
in many cases is a rather invariable trait, not depending on temperature or motivation, and thus potentially valuable 
for taxonomic purposes. 

Call repetition rate (or note repetition rate) is strongly affected by temperature and various other factors that 
might be summed up by the term motivation (see above). Call repetition rate seems to be particularly dynamic 
within individuals and is very strongly affected by the motivational state of the specimen. It should therefore be 
used in taxonomy only if calls of comparative individuals were recorded in exactly identical situations (i.e., 
syntopically at the same time), or if the particular vocalizations consist of longer series of rather regularly repeated 
calls / notes, and only after correcting for temperature. In any case, taxonomists should be rather cautious when 
comparing call repetition rates. 

Future studies might reveal other properties of anuran calls that are useful for taxonomy. One of these has been 
pointed out by Gingras et al. (2013): the spectral flatness, a quantitative measure of tonality, has turned out to be 
valuable in the identification of different clades of frogs. The spectral flatness, also known as tonality coefficient or 
Wiener entropy, is calculated as the geometric mean of the power spectrum divided by its arithmetric mean 
(Dubnov 2004). This measurement has not been tested so far as a tool for species delimitation in taxonomic 
approaches, but similar to dominant frequency, spectral flatness has been found to be inversely related to SVL in 
three of four frog clades studied (Gingras et al. 2013). 

Among the most unreliable call properties for taxonomic purposes are the presence and number of harmonics. 
As shown herein (see sections on recording artifacts below), visualization of harmonics in a spectrogram depends 
on numerous non-biological factors such as recording distance, recording angle, recording level and saturation, and 
even on the selected FFT window width (Fig. 4). Even if such technical factors can be fully excluded, it is unlikely 
that the presence or absence of true harmonics in a call would be indicative of taxonomic differences. 


Recommendations for call descriptions used in anuran taxonomy 

In a description of an anuran vocalization for taxonomic purposes, it is recommendable to provide as much 
information as possible on the kind of sound that is being described. At first, the type of call should be mentioned 
(in taxonomy in most cases this will be the advertisement call). While an onomatopoeic description of the sound 
might be useful sometimes, it is more important to provide a general classification based on more objective 
categories such as those of Beeman (1998) (Fig. 5). It is important to define the description scheme used (call- 
centered or note-centered) and clearly define what is considered a call and a note. As a next step, detailed 
information on the general structure of the call should be provided and may contain the following parameters, if 
applicable (in parentheses the respective variables in call-centered definitions): (1) number of note (or call) types; 
(2) structure of note (or call) type(s); (3) number of notes per call (or calls per call series); (4) pulsed, tonal or other 
properties of notes (calls); (5) arrangement in groups or in series; (6) number of pulses per note (call); (7) 
amplitude modulation of notes (calls); (8) frequency modulation of notes (calls); (9) presence or absence of 
harmonics. 

In addition, the following temporal and spectral variables should be measured, if applicable: (1) call duration; 
(2) note duration; (3) duration of inter-note intervals (inter-call intervals); (4) note repetition rate within calls (call 
repetition rate); (5) pulse duration; (6) pulse repetition rate within notes; (7) inter-call intervals; (8) dominant 
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frequency; and (9) bandwidth (or approximate prevalent bandwidth), i. e., the upper and lower frequencies of the 
call. 

Temporal and spectral measurements should be presented in a statistically meaningful way, such as mean ± 
standard deviation, and the range (minimum and maximum values). Precise information should be provided on the 
number of call units measured for each variable, and on the number of different individuals recorded and used for 
the analysis and description. If calls of several conspecific individuals were analyzed, in case of slight quantitative 
differences, it is recommendable to calculate and provide mean and range of values for each individual separately. 

In taxonomy, comparative bioacoustics always should include good quality, comparative figures. The settings 
in the analysis software should be chosen in favor of a clear representation of the call structures. This may involve 
the need of careful filtering and/or adjusting the sensitivity settings of the spectrogram (but the unfiltered recording 
should always be made available). A spectrogram and the corresponding oscillogram should be provided at the 
same parallel time scale. When directly comparing calls of different lineages, populations, or species, all respective 
oscillograms and spectrograms should be provided at the same time and frequency scale, and same TFT setting, if 
possible. Often, it will be useful to provide additional spectrograms and/or oscillograms at different temporal 
resolution to illustrate different aspects of call structure—for instance, all calls of one call series in a temporal 
resolution of 10 s, and the pulses of a single call in a resolution of 1 s. 

Detailed information on recording conditions, namely precise locality, date, time, temperature (of air and/or 
water, depending on the calling site) at time of recording, weather conditions, air humidity, social context, calling 
behavior, etc., as well as a description of the recording gear (recorder, microphone), gear set up (e.g., sampling rate 
of the recorder), and all procedures conducted during analysis (e.g., filtering, spectral settings, resolution) is part of 
a call description. Especially for taxonomic purposes it is also of utmost importance including metadata. Besides 
date, time, site and temperature of recording, information should be given that links the recording analyzed with the 
call voucher, including the field number and/or scientific collection number, the respective GenBank accession 
number of sequence data referring to the same specimen (if available), and number of the sound file deposited in a 
public archive. Additional suggestions can be found in the boxes below that summarize our hands-on 
recommendations. 


Verifiability of call recordings 
Voucher specimens and photos 

Any attempt to use call recordings in anuran taxonomy requires a reliable identification of the individual emitting 
the sound. Although sounding trivial, achievement of a proper identification of the individual might be exposed to 
several practical problems, and uncertainty should thus be clearly reported. If the respective permissions are 
available, collection of the recorded calling frog as a scientific voucher specimen is one of the most crucial steps 
for using bioacoustics in taxonomy (see the boxes below for precise recommendations on how to proceed with 
identifying and collecting voucher specimens). 

Information needed to relate the recording to the voucher specimen should be reported. This is preferably done 
by proper tagging of specimens with field numbers, and relating field notes, recording ID and miscellaneous 
observations to these numbers (see Kok & Kalamandeen 2008 for description of field methods for voucher 
specimen preparation). 

A well-prepared and documented voucher specimen is essential for any future revision of its taxonomic status. 
For maximum availability to the scientific community, it should be deposited in a well-managed and accessible 
scientific collection. Apart from the exact locality, date of collection and collectors’ names, its collection data 
should include the link to the respective call recording and its place of storage. 

Prior to preparation of a voucher specimen, the individual should be photographed in life. These photographs 
should at least include a dorsal and a ventral view of the specimen and should include a size scale. However, we 
strongly recommend taking as many detailed pictures of the living voucher as necessary to identify possible 
diagnosfic characfers in external morphology later (Kok & Kalamandeen 2008). These may include a lateral close- 
up view of the head (including the tympanum area), details of ventral surfaces of hands and feet (including 
webbing), details of hidden surfaces of legs and details of particular structures apparently characteristic for the 
species (e.g., prepollex, femoral, inguinal or gular glands, dermal appendages, etc.). These photos will later be of 
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help to evaluate the identity of the calling frog and are exceptionally important in cases where there is no 
permission to collect specimens. In such cases, the photos, preferably in association with measurement of the 
snout-vent length and a buccal swab for DNA barcoding (if permitted), constitute the available dataset to allocate 
the sound recordings. 

If a preserved voucher is lacking, call information is often only of limited use for conclusively taking 
taxonomic decisions. Yet, when accompanied with detailed photographs, call information can be valuable to 
assessing call variation in species that are easily diagnosed by external characters, and to hinting at the presence of 
possibly taxonomically distinct units which require additional collection work. 

It is recommended that representative photos of the recorded individual are deposited together with metadata 
in publically accessible photo (or audiovisual) archives (Toledo et al. 2015b). There are several options to deposit 
photos in online picture archives, such as AmphibiaWeb, which also allows the upload of sound fdes. However, in 
any case, make sure that the respective photos appear linked with information on the respective call recording. 

In cases where bioacoustics constitutes an essential part of taxonomic species delimitation and results in the 
description of a new species, it is highly advisable that authors select a call-recorded voucher as the holotype 
specimen. When advertisement calls are described for species that have already been named, it is recommendable 
to obtain recordings from their type locality (as well as obtaining topotypic voucher specimens), as this increases 
the chances of actually describing the call of a particular nominal species. However, in the latter case, careful 
morphological comparisions of newly collected call vouchers and original type specimens is warranted, as (cryptic) 
diversity at a single locality can be unexpectably high, particularly in the tropics (e.g., Jansen et al. 2011; Gehara et 
al. 2014; Fouquet et al. 2016). 

In recent years, recordings of anuran calls from various geographic regions were published as audio CDs (e.g., 
Marty & Gaucher 1999; Rodel 2000; Cocroft et al. 2001; Marquez et al. 2002; Haddad et al. 2005; Vences et al. 
2006; Alonso et al. 2007; Elliot et al. 2009; Du Preez & Carruthers 2009; Kwet & Marquez 2010; Rosa et al. 
2011). These published sound fdes are potentially useful sources for call comparisons in taxonomy, but only if 
respective recordings are accompanied by data allowing for the verification of the identity of calls. While most 
booklets in such audio CDs provide information on recording locality, recording date and temperature, data on 
voucher specimens is usually lacking. Thus, in many cases the taxonomic allocation of calls in these publications 
must be considered to be potentially in error. Using published sound recordings in taxonomy requires great care 
(the same is true for comparisons with described calls in printed literature; see below) and it is highly advisable to 
contact authors of such recordings directly in order to verify published data or supplemental information. 


DNA barcoding 

The application of molecular genetics and its integration into taxonomic research on anurans has revealed a 
tremendous amount of hidden diversity, particularly in the most species-rich tropical regions (e.g, Kohler et al. 
2005a; Stuart et a/. 2006; Fouquet eta/. 2007; Crawford eta/. 2010, 2013; Funk eta/. 2012;Barej et al. 2015;Kok 
et al. 2017). A great part of this uncovered diversity is considered to represent different species (Fouquet et al. 
2007; Vieites et al. 2009). Most of this formerly undescribed species diversity is to some extent cryptic, adults of 
different evolutionary lineages being similar in external characters and thus very difficult to distinguish from each 
other by morphology alone. Furthermore, other integrative taxonomic studies revealed considerable genetic 
variation in what today is considered a single species (e.g, Gehara et al. 2014), or extreme morphological and 
chromatic polymorphism among individuals which are almost identical genetically (e.g, Kohler et al. 2010; Kok et 
al. 2012). Although these recent findings constitute a great progress in knowledge and understanding of species 
diversity and evolution, they potentially put in doubt published call descriptions (as well as descriptions of 
tadpoles, life history, etc.) if these cannot be reliably linked to one of the genetic clusters. This applies, for example, 
to the results of Gehara et al. (2014), who by applying molecular genetics discovered 43 divergent lineages in what 
was considered to represent the hylid frog Dendropsophus minutus and a few related South American treefrog 
species. With up to three different lineages occurring in sympatry, it becomes obvious that it is impossible to 
unequivocally allocate former call descriptions referring to the name D. minutus from any locality to one of the 
lineages identified, unless these are accompanied by a genetic identification of the call voucher (for a similar 
example in Africa, see Channing et al. 2013). 
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Taxonomy based on call comparisons of specimens without molecular identification will—in view of the 
possibly large degree of cryptic diversity—always suffer from uncertainty of allocations to names or lineages, and 
thus contribute little to a sound taxonomy. In conclusion, we strongly suggest that call descriptions should 
wherever possible be accompanied by a DNA sequence of the recorded specimen, or at least of another specimen 
from the same population that also was reliably heard emitting the same call. Where collection of a voucher is not 
possible due to permit constraints, or because the species is of high conservation concern, non-invasive sampling 
techniques are an alternative to allow for making species identification of call recordings verifiable using 
molecular methods. If permitted, it is recommendable to obtain a tissue sample by toe-clipping or cutting a tiny 
piece of webbing. When obtaining tissue samples is not allowed, an alternative is to take a buccal swab, which will 
yield sufficient quantity and quality of DNA (if appropriately preserved) to at least amplify and sequence a 
fragment of the mitochondrial DNA. 

While reliable species identifications are still best achieved by sequencing a segment of the mitochondrial 16S 
rRNA gene for many anuran groups (Vences et al. 2005), it might be more useful in most cases to contribute to the 
global DNA barcoding efforts (Murphy et al. 2013) and sequence instead the ‘barcoding segment’ of the 
cytochrome oxidase subunit I (COI or cox-1) gene. Amphibian primers for this gene now exist and have been 
shown to work reasonably well (e.g.. Smith et al. 2008; Crawford et al. 2010, 2013; Xia et al. 2012; Che et al. 
2012; Perl et al. 2014; Hawlitschek et al. 2016). Detailed protocols for DNA barcoding amphibians have been 
summarized by Vences et al. (2012b). 


Collection management of sound recordings 

Since the rise of appropriate mechanical devices, biologists have increasingly recorded and documented sounds 
from nature (see Ranft 2004 for a review). Sound recordings per se have a high scientific value yet most of them 
are not made available along with call descriptions, and are not appropriately archived. Recordings that are not 
housed in institutions or sound archives are at high risk of loss by material degradation or misplacement (Marques 
et al. 2014). As a consequence, efforts are being undertaken to meet the challenge of preserving, storing and 
managing audio and video recordings for subsequent generations of scientists, and making these data accessible to 
the public, via scientific institutions, sound archives and repositories (Ranft 2004; Obrist et al. 2010; Cugler et al. 
2011; Marques & Auraujo 2014; Marques et al. 2014; Toledo et al. 2015b). 

Taxonomically, the highest relevance corresponds to recordings of specimens that were collected and 
deposited in zoological collections. These recordings should be stored, managed and cataloged along with the 
collected specimens in the same institutional collection, or in a sound archive or repository linked to the respective 
voucher in a museum (Obrist et al. 2010). The same is true for video sequences of calling anurans that increasingly 
are used in call descriptions and include important additional information, such as microhabitat, calling site or 
muscle contraction during sound production (Bee et al. 2013a, b). 

The International Bioacoustics Council (http://www.ibac.info/links.html#libs) provides a comprehensive list of 
links to all major sound archives {e.g., Tierstimmenarchiv Berlin, British Library Sound Archive's wildlife 
collection, Macaulay Library of Sounds, Fonoteca Neotropical Jacques Vielliard Brazil; see Table 8). Sound 
archives are important repositories of worldwide biodiversity (Ranft 2004; Toledo et al. 2015b), but depositing 
sound files in accessible collections is yet to become a universal practice (Toledo et al. 2015b). Given the 
importance of bioacoustics in frog taxonomy, it would much facilitate taxonomic work if all described calls were 
already deposited in sound archives. 

It has become best practice in biology to follow an open-access policy for repositories of data that are linked to 
published results (for an overview of major biological repositories see http://www.nature.com/sdata/data-policies/ 
repositories). This is common practice for DNA sequences where most journals require that they are deposited in 
the International Nucleotide Sequence Database Collaboration (http://www.insdc.org/) which includes GenBank, 
the DNA DataBank of Japan (DDBJ), or the European Molecular Biology Laboratory (EMBL). The Dryad 
repository (http://datadryad.org/) makes a variety of data available, and specialized image repositories exist as well. 
In a similar way, it will be important to establish a user friendly and open-access network of sound repositories for 
anuran calls. These should ideally be accessible (guaranteeing data sharing and replication of past studies) and 
institutional (increasing the chances of long-term maintenance). At present, 85% of the recordings available in 
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wildlife sound collections are from birds (C. Araujo & P. Marques, pers. comm.). Online open-access is not a 
common practice for all data in the available anuran sound libraries (Table 8), although subsets of data are 
generally available. For the time being, we recommend submihing call recordings in their entirety to one of the 
major sound repositories (Table 8), and representative sections (in particular those that were used for producing 
spectrograms) also to AmphibiaWeb (http://amphibiaweb.org/) where they can be linked directly to the respective 
species accounts. 

We recommend speaking some baseline information (locality, date, temperature, social context, etc.) on the 
same track containing the recorded calls, as these data will then be unequivocally connected to the sound recording 
facilitating its proper archiving. 


Technical equipment and software for call recording and analysis 
Recording equipment 

In the early years of bioacoustics research, different types of tape recorders were the first choice to record anuran 
calls in the field (Littlejohn 1998). Although these produced recordings of reasonable quality, they frequently 
suffered from mechanical or electrical problems caused by high humidity and rough field conditions. These 
problems sometimes resulted in artificial noise on the recordings, varying tape speed or complete disfunction. 
Later, digital tape recorders (DAT) in theory promised uncompressed high quality recordings, but the mechanical 
apparatus included turned out to be even more delicate and, thus, more prone to damage (Fleyer 1994). MiniDisc 
recorders were apparently more robust, but suffered from excessive data compression and, as a result, from 
recognizable alteration of sounds. We recommend using modern digital recorders that save files on flash memory 
cards or hard drives. These are compact in size, are rather less susceptible to mechanical damage, provide a better 
frequency response and are comfortable in use. 

The market of mobile digital devices suitable for anuran call recordings has grown considerably, and the 
turnover is very fast. The mass production of integrated processors made hand-held digital recording devices much 
more affordable and the quality / price relation in general increased significantly. Most of these devices are 
designed for high-quality music recording and thus fulfill the requirements for frog call recording in almost all 
cases. We refrain from recommending any particular devices. Flowever, we list some major companies/brands that 
proved to produce suitable hand-held recorders: Marantz, Olympus, Roland/Edirol, Sony, SoundDevices, Tascam 
and Zoom. 

In any case, the recorder should have the possibility to manually adjust the recording level, as automated 
adjustment can lead to numerous artifacts. Furthermore, it must be possible to save digital sound files in an 
uncompressed format such as *.WAV (recording in a compressed sound format such as MP3 must be avoided). 
Depending on the intended usage, the recorder should be of solid build to withstand rough field conditions. Needed 
batteries should be of a common and widely available type. A built-in speaker, even if of low quality, may aid in 
triggering calls in the field by playback of sound, but it needs to be considered that this method might interfere with 
the calling motivation of the recorded frog or elicit aggressive calls rather than advertisement calls. 

Some important properties of the recording equipment are the technical frequency response, frequency range 
and low distortion (Heyer 1994). Anurans are able to produce very low frequencies as well as ultrasonic sounds. 
For standard recordings, a flat frequency response of the microphone-recorder combination in the 60-16,000 FIz 
range is recommendable for most species, but the wider the frequency range, the better. Recordings targeted at 
documenting ultrasounds require particular equipment (see below). A flatter frequency response usually implies 
more expensive, high-quality equipment. Detailed information on technical features of any recording device or 
microphone should be available from the manufacturer, if not provided with the manual. When different 
microphone-recorder combinations have been used, and only quantitative differences in the spectral domain are 
found between putative species, researchers should guarantee that differences in the dynamic ranges of the 
equipment are not influencing their results. An easy test to control this effect would be to simultaneously record a 
synthetic signal, with spectral properties encompassing the range observed in the study species, with all the 
recorder-microphone combinations used in the study. Variation in repetitive spectral measurements of such 
recordings should be considerably smaller than those observed among amphibian populations under study. 
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In general, quality of the recording equipment is obviously important. Researchers planning to do 
bioacoustical research, or to work on taxonomically complex anuran groups where detailed statistical comparisons 
of bioacoustical variables are necessary, should adhere to best practice and work with solid-state recorders with 
external microphones. An external uni-directional microphone is certainly part of a best-practice equipment and is 
particularly useful in noisy environments where there is the need to partly blind out unwanted sounds and focus on 
a particular target sound (i.e., a calling frog individual). 

However, not in all situations will such best-practice equipment be available. Many observations of rare frogs 
are made occasionally, for instance after heavy thunderstorms, and often by biologists who were not even planning 
to do bioacoustical research. In such situations, it should be considered that built-in microphones of many digital 
recorders available today are commonly of very good quality and, in most cases, are sufficient to obtain good to 
very good recordings. In semi-professional equipment, these built-in microphones often are accompanied by an 
option for setting a directional function. Even in most digital cameras and smartphones, there are options for 
recording sounds; even if these will often be of comparatively poor quality, they can be useful at least to extract 
rough information on call structure. 

In order to provide a first, even if not fully representative evaluation of the effect of recording equipment on 
acoustic measurements, we performed a field trial with one individual of Bombina bombina recorded 
simultaneously with four different combinations of recorder/microphone, as follows: (1) Tascam DR05 digital 
recorder with Sennheiser K6+ME66 microphone; (2) Edirol R09 recorder with built-in microphone; (3) Apple 
iPhone 6 with built-in microphone; and (4) Sony D6C analog tape recorder (fitted with a type II cassette tape) with 
Audio Technica ATR6250 external microphone. All digital recordings were performed at a sampling rate of 44,100 
Hz and the analog recording was later digitized with CoolEdit Pro software at the same sampling rate. Sound files 
were amplitude-normalized and automatic measurements were taken with the aid of SoundRuler 0.9.6 software 
(Gridi-Papp 2003). This program allows the use of facultative algorithms for quick and objective acoustic 
measurement of various call features, with high accuracy (Bee 2004b). We opted for automated call recognition 
and measurements to eliminate every observer bias in the comparison. We used manual call recognition settings 
and visually inspected oscillograms and spectrograms until all calls were measured under the same settings (1024 
EFT, 90% overlap; spectral resolution = 43 Hz). A total of nine calls from a single call series exhibited sufficient 
quality for measurements. The variables bandwidth (-10 dB), call rise time, envelope shape, and call duration 
diverged the most between recording equipment used (Figs. 19-20). The results can be explained by differences in 
signal-to-noise ratio and its effects on the delineation of pulses and bandwidth measurements. The Tascam / 
Sennheiser and Sony / Audio Technica combinations showed the lowest median values of bandwidth and rise time, 
and were probably closest to the true values. There was, however, a broad overlap of values between all four 
equipments and the differences detected were mostly on temporal features where the maximum difference between 
group means were in call duration (25 ms) and call rise time (30 ms). Spectrally, the dominant frequency was 
identical among all recording equipment combinations, and the detected variation in bandwidth is probably of little 
relevance as the maximum difference between group means was only 0.75 Hz (Fig. 19). 

In addition to these measurements, which derive from pre-defmed call properties, we also assessed overall 
acoustic similarity between calls from different recording combinations using spectral cross correlation analysis. 
Spectral cross correlation analysis slides one spectrogram over another and reports the maximal similarity value 
(0-1) found between the two sounds. We selected one Bombina bombina call recorded simultaneously with the 
four combinations of recording equipment and applied spectral cross correlation analysis as implemented in 
SoundRuler software. The spectral correlation matrix was plotted along with a dendrogram (complete linkage 
clustering). This suggests that spectral properties of calls analyzed from the two digital recorders (with or without 
external microphone) were very similar. The recordings from the analog recorder (Sony) were still more similar to 
these two than those of the iPhone, which were the most divergent overall. 

These results indicate that, in natural settings, the quality of the microphone and recorder impacted more 
heavily on the fine-scale temporal parameters of call recordings than on the registered dominant frequency. 
However, we reiterate that devices such as smartphones and cameras with built in microphones are to be used only 
as a last resort, in cases with no other option available to record a frog call, and cannot be considered as best- 
practice equipment. 
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FIGURE 19. Comparison of the recording performance of four different recorder/microphone combinations on a set of nine 
advertisement calls of one individual of Bombina bombina at Schorfheide-Chorin Reserve, Germany. All recordings were made 
simultaneously at the same recording distance, and same calls were thus compared. Call variables were automatically assessed 
using SoundRuler software (see text for details on the methods employed). Recording equipment as follows: Tascam DR-05 
digital recorder/Sennheiser K6+ME66 microphone; Edirol R09 recorder with built-in microphone; Macintosh iPhone 6 with 
built-in microphone and recording software; Sony D6C cassette tape recorder with Audiotechnica external microphone 
(recordings digitized with CoolEdit Pro software at sampling rate of 44.1 kHz). Rise time is the time from the start of a call to 
the point where it reaches the maximum amplitude. Shape-on is the ratio between the rise time and the total duration of a call. 
Other call properties as defined in the text. Boxplots show median (middle line), first and third quartiles (upper and lower box 
limits), and non-outlier range (whiskers). 


Ultrasounds 

When choosing recording equipment and interpreting the results of spectral analysis, it is important to keep in mind 
that many standard microphones are optimized for recording in the frequency range audible to humans. Even if the 
default sampling rate used by many programs would allow detecting sound emissions up to 20 kHz, these are 
simply not recorded by many microphones (although many built-in microphones of modern mobile digital 
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recorders have a frequency response of up to 20 kHz). Recent research has however identified at least three frog 
species {Odorrana tormota, O. livida and Huia cavitympanum) that emit and detect vocalizations in the ultrasound 
spectrum (Feng et al. 2006; Feng & Narins 2008; Arch et al. 2008, 2009, 2012; Shen et al. 2008, 2011). There is 
also an indication that distress calls of the Neotropical Haddadus binotatus reaches ultrasound frequencies (Toledo 
& Haddad 2009). While pure ultrasonic frog calls might be exceptional and geographically restricted to noisy 
torrent environments, the existence of some ultrasound components in anuran vocalizations is probably more 
frequent than currently recognized (Orrico et al. 2014). Our own recordings of a variety of anurans indicate that 
especially in miniaturized frogs, important components of the emitted frequencies are above 20 kHz, and also in 
other frogs, harmonics can reach the ultrasonic range (Fig. 21). It is uncertain, if these high-frequency components 
can be perceived by conspecifics and it appears unlikely that they convey important signals. However, to fully 
understand vocalizations, especially in frogs living in noisy environments or of very small body size, it might be 
useful to perform recordings with equipment suitable for ultrasounds. This is possible, for instance, by using 
several of the more advanced bat detectors commercially available, or by using special ultrasonic microphones 
such as the Ultramic 200k (Dodotronic, Italy). 
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FIGURE 20. Results of a spectrogram cross-correlation analysis (FFT = 1024, window = Hanning, overlap = 90%) comparing 
one call of Bombina bombina at Schorfheide-Chorin Reserve, Germany, recorded with four different combinations: Tascam 
DR05 digital recorder with Sennheiser K6/ME66 microphone (Tascam); Edirol R09 recorder with built-in microphone 
(Edirol); Sony D6C analog tape recorder (fitted with a type II cassette tape) with Audio Technica ATR6250 external 
microphone (Sony); and Apple iPhone 6 with built-in microphone (iPhone). Values in the cells represent pairwise pixel-by- 
pixel similarity values between spectrograms, produced from the respective recordings of the same individual call. 


Available sound analysis software 

A number of computer programs are available for sound analysis. Most have been designed for general scientific 
sound analysis (sometimes for speech, e.g., PRAAT, Signalyze) and are suitable for analysis in anuran 
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bioacoustics. Some have the advantage of having been specifically designed for animal research, often primarily 
for bird and bat calls. In the field of anuran bioacoustics, more specifically in anuran call descriptions, the programs 
most commonly cited are Canary and Raven; other software regularly employed in call descriptions are Adobe 
Audition (formerly Cool Edit), Audacity, Avisoft, Syrinx, Seewave package for R, Signalyze, and Sound Ruler. 
The use of other computer programs such as Arbimon, Goldwave, PRAAT, Signal/RTS, WASIS, or Sound 
Analysis Pro seems less frequent in anuran bioacoustics. The perfect software does not exist and all these programs 
have interesting features, qualities, and shortcomings. Cost of the license can also be an issue. We provide below a 
short overview of the most frequently cited computer programs in anuran call descriptions, with some pro and 
contra arguments. As of today, using Raven in combination with the Seewave package for R seems a good 
compromise between performance, ease of use, and cost; both work on multiple platforms, which is an additional 
advantage in terms of reproducibility of measurements {i.e., whatever the platform used, analyses are reproducible 
using the same software). However, all of the mentioned programs are adequate for analysis of anuran calls for 
taxonomic purposes, and all are able to handle uncompressed sound file formats such as *.WAV which are 
recommended for analysis (see above). Of the plethora of compressed sound file formats available, many will not 
be readable by most of the programs. 

The software Raven (Bioacoustics Research Program 2011) works on multiple platforms (Linux, Macintosh, 
Windows). Originally designed for animal acoustics, this software can perform most analyses required in anuran 
taxonomy. The interface is intuitive and customizable, and there is the useful feature to store measurements in 
tables. A free (limited) version is available, and the pro version license is relatively inexpensive (400 USD/ca 360 
EUR for the academic version; discounts of 25-100% are applied to users from developing countries). Recent 
versions of the software now provide ‘robust signal measurements’ that are less dependent on the selection 
rectangle and should be prefered over the selection-dependent measurements. One possible shortcoming is that 
spectrogram illustration may sometimes require graphic refinement before publication, but this is mostly a matter 
of taste. In Appendix la, we provide a step-by-step guide to this program. 

Seewave (Sueur et al. 2008a) is a plugin for R (R Development Core Team 2015). It provides the possibility of 
scripting and therefore automated and customized analyses become possible but require substantial programming 
skills. One of the biggest advantages of Seewave is the possibility to produce graphically appealing spectrograms 
and oscillograms, as used for several figures herein. In Appendix lb, we provide a protocol for spectrogram 
production with this program. 

Audacity (http://audacity.fr/) also works on multiple platforms (Linux, Macintosh, Windows). It is free and the 
interface is intuitive. Audacity is an excellent program for editing/exporting sounds, as well as for exploring 
spectral features, but it is not as powerful as most software dedicated to call analysis. It is also sometimes less 
convenient to use {e.g., there is no side-by-side view of spectrograms/oscillograms). 

Adobe Audition (Adobe Systems Software; formerly Cool Edit) works on Macintosh and Windows platforms. 
As Audacity, Adobe Audition is primarily designed for digital audio editing. There are multiple display setting 
options, including parallel view of different recordings. Filtering, sample type conversion and other useful 
functions are easy to apply. The program is excellently suited for viewing or screening long recordings, as the 
implemented zooming tools are very powerful and comfortable in use. However, licensing is somewhat expensive 
with prices starting from 26 USD/ca. 23 EUR per month and there is no side-by-side view of spectrograms and 
oscillograms. 

SoundRuler (Gridi-Papp 2003) is a free and open source program that runs on Windows, Macintosh and Linux 
platforms, specially designed for quick and objective measurement of relatively simple and repetitive sounds as 
they are typical of many anurans. The program uses facultative algorithms for measuring various call features with 
high accuracy (reviewed by Bee 2004b). Although standardization, objectivity and speed are its main strengths, the 
software also provides publication-grade graphics. Its main drawbacks are possibly: (1) navigation and sound 
visualization are a bit cumbersome; (2) determining the optimal call and pulse recognition settings can be 
complicated and time-consuming as the user manual is not very informative; (3) customizing the graphs requires 
comparatively long time. In summary, SoundRuler is the ideal software for fast processing of simple sounds but 
with a steep learning curve for the user. 

Syrinx (http://www.syrinxpc.com/) works on Windows only and offers basic analysis and setting abilities. 
Syrinx is free, simple (only grayscale spectrograms) and easy to use, but currently not in active development. 

Avisoft-SASLab Pro (Specht 2006; Avisoft Bioacoustics, Germany) works on Windows only. It is animal- 
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specific and can perform all analyses required in anuran taxonomy (perhaps outperforming Raven and SoundRuler 
in objectivity and accuracy of measurements, and graphical output). Interface is intuitive, a free (limited) version is 
available, but the pro version license is rather expensive (1800 EUR/ca. 2030 USD for the educational version). 
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FIGURE 21. Spectrograms and oscillograms showing frog advertisement calls with components at frequencies higher than 
usually reported, partly reaching into the ultrasound spectrum. In these four frog species, the dominant frequency is below 10 
kHz but harmonics are visible at much higher frequency. Eleutherodactylus iberia (from Bahia de Taco, Cuba) and Stumpffia 
sp. [Ca6 Vieites et al. 2009] (from Andasibe, Madagascar) are miniaturized frogs with <11 mm snout-vent length (SVL) 
whereas Eleutherodactylus [Ca4 Rodriguez et al. 2010b] and Platypelis barbouri are small sized frogs around 20 mm SVL. 
Note that frequency of the spectrograms goes up to 40 kHz. Eleutherodactylus [Ca4] has almost no frequency components in 
the ultrasound spectrum, yet frequency reaches distinctly above 10 kHz, much higher than usually reported for frogs. 
Recordings were made with ultrasound microphone Ultramic200k (Dodotronic, Italy). Graphics produced with the R package 
Seewave (Sueur et al. 2008a). All spectrograms produced at Hanning window function, 1024 bands resolution. Format of 
candidate species names follows Padial et al. (2010). 


Automated recording and signal recognition 

A number of technological advances now make possible the automatic acquisition, storage, and processing of large 
amounts of acoustic information and lead to the development of soundscape ecology, a sub-discipline of landscape 
ecology (Pijanowski et al. 2011). 
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There is an obvious gap between the increasing number of species descriptions of anurans, and the lack of 
detailed quantitative descriptions of their vocalizations (Bee et al. 2013a, b). The individual call repertoire, 
variation, plasticity and change over time might be underestimated (Narins et al. 2000; Jansen et al. 2016b) and 
knowledge on the effective signal space of individuals, populations or species might be of taxonomic relevance. 
Automated recorders and signal processing are helpful in bioacoustical monitoring approaches {e.g., Terry et al. 
2005; Tripp & Otter 2006; Bardeli et al. 2010; Laiolo 2010; Blumstein et al. 2011) and thus in ecological studies 
and conservation. There are several recent works using modern passive recording devices for the estimation of 
biodiversity in holistic approaches (e.g., Sueur et al. 2008b; Blumstein et al. 2011; Depraetere et al. 2012; Gasc et 
al. 2013; Potamitis 2014; for a review see Obrist et al. 2010). Relatively few studies target anuran assemblages 
(Bridges & Dorcas 2000; Todd et al. 2003; Acevedo et al. 2009; Acevedo & Villanueva-Rivera 2006; Villanueva- 
Rivera 2007; Waddle et al. 2009; Llusia et al. 2011; 2013a, b; Ospina et al. 2013) or concern the acoustic 
monitoring of single rare frog species. Results are potentially relevant for taxonomy, if comprehensive and 
quantitative call descriptions are provided (Akmentins et al. 2014; Jansen et al. 2016b; Willacy et al. 2015). 
However, recordings obtained from automated devices can be affected by environmental effects (e g, differential 
excess attenuation, reverberation on trees and shrubs) and might be biased if calling individuals are too close or too 
far from the automated recorder, so their use in taxonomy must be approached with caution. 

For the long-term acoustic monitoring of natural habitats, autonomous, waterproof recording devices 
(Automatic Recording Systems, ARS) can be installed in the field. Many studies use Frogloggers (construction 
manual given by Peterson & Dorcas 1994; e.g., Acevedo & Villanueva-Rivera 2006; Akmentins et al. 2014) or the 
Songmeter recorders (Wildlife Acoustics Inc.; e.g., Lehmann et al. 2014; Zwart et al. 2014; Ganchev et al. 2015; 
Jansen et al. 2016b). For the automatic detection of particular animal sounds there is a variety of algorithms that 
can be used, and several programs on the market have already implemented species identification tools for several 
taxonomic groups (Ganchev et al. 2015). Although it is not in the focus of this paper to evaluate this increasing 
body of software, algorithms or machine learning techniques for signal detection (Acevedo et al. 2009; Huang et 
al. 2009), we will give some commonly used examples - without evaluating their efficiency. 

The program Raven Pro includes two kinds of detectors (amplitude and band limited energy) that create 
‘selections’ based on pre-selected thresholds. Those selections result in measurements such as dominant frequency 
or call duration (Jansen et al. 2016b). 

Song Scope (Wildlife Acoustics Inc. 2014) is a software that uses recognizers created by the user based on 
reference vocalizations of the targeted species, and it provides the user with measurements, for instance of 
dominant frequency and call duration (e g.. Waddle et al. 2009; Zwart et al. 2014; Willacy et al. 2015). 

Arbimon //is a web-based network for storing, sharing, and analyzing acoustic information (Aide et al. 2013; 
Ospina et al. 2013). The website (arbimon.sieve-analytics.com) provides a module for viewing, listening, and 
annotating recordings, as well as an interface for automated species identification based on the Hidden Markov 
Model (HMM) algorithm. 

Avisoft-SASLab Pro includes automated parameter measurements, classification of sounds by means of 
spectrogram cross-correlation or pulse train analysis (e.g., birds: Frommolt & Tauchert 2014; frogs: Hanna et al. 
2014). 

XBAT (Bioacoustics Research Program of the Cornell Laboratory of Ornithology, https:// 
dl.dropboxusercontent.com/u/4142063/build/home.html), written for MATLAB and designed to satisfy the diverse 
sound analysis needs of scientists who deal with large-scale datasets. It provides a vast array of tools for call 
detection, measurement, and illustration that can be customized to the specific research needs through 
programming interfaces. 

WASIS is a freeware and compares sound files (e.g., from different species) by means of two algorithms: the 
Hidden Markov Model (HMM), which is based on machine learning, and power spectrum correlation analyses 
(Tacioli et al. 2016). 

Finally, packages written in the R environment, like monitoR (Hafner & Katz 2014) or Seewave (Sueur et al. 
2008a), or in the MATLAB environment (‘semiautomatic procedure’; Castellano & Rosso 2006; Rosso et al. 2006) 
may be promising tools for signal recognition. 

Automated detection and recognition programs, however, still have their shortcomings and limitations, and 
results should not be used without careful verification. It needs to be taken into account that the automated 
detection of calls in longterm recordings obviously relies on the quality and situation setting. Generally, automated 
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detection works better in recordings made under low ambient noise conditions, with single individuals of frogs that 
vocalize with a rather simple signal structure (e.g., Jansen et al. 2016b). 

Another category of software that can be expected soon to increase in importance consists of web-based and 
mobile applications for analyzing and detecting calls. One web-based tool targeted at anuran calls is Arbimon (see 
above). Until now no mobile application for the identification of anuran calls is available, however, an ‘app’ with 
identification keys for frogs was recently developed (Parveen et al. 2014), and the implementation of acoustic 
identification in such ‘apps’ will probably be realized soon, as already done for bats (BatMobile: http:// 
batmobile.blogs.ilrt.org/) or birds (Bird Song ID: http://www.isoperla.co.uk/BirdSongld.html; Warblr: http:// 
warblr.net). 

Obviously, call recognition algorithms are developed to identify sounds based on their similarities to training 
models {e.g., samples of known calls). Because taxonomists frequently aim at describing the unknown biodiversity, 
the application of call recognition algorithms to taxonomic problems is limited and we strongly recommend not 
relying solely on automated analyses in purely taxonomic approaches. In our experience, at their current stage, 
automated analyses have some probability to measure sound structures not being part of the target sound. This is 
particularly evident in recordings of lower quality {e.g., low amplitude of the target sound, background noise, 
overlapping calls, echos, etc.). A proper analysis of call characters for taxonomic purposes and comparisons can 
indeed benefit from automatic measurements, but will always require an expert validation of results, especially 
regarding the identification and separation of the target sounds and to control for different kinds of artifacts. 


Common pitfalls and recommendations for recording and editing sounds 

Recording sounds in the field, and analyzing the sounds on a computer, are straightforward in the advent of solid- 
state digital recorders and menu-driven sound-processing software. However, a number of artifacts may occur in 
the process and need to be taken into consideration. 

In the field, manual adjustment of the recording level is of high importance. The recording level should always 
be set so that no oversaturation occurs at the highest amplitude corresponding to the targeted call. Automated 
leveling of recording might lead to numerous artifacts, especially to grossly distorted frequency information. It can 
also lead to inaccurate temporal measurements, especially if a frog suddenly starts calling after a longer period of 
silence and the device automatically levels down the recording after detecting the sudden sound energy. In 
recorders that have a level meter with a pointer precaution is necessary when recording high frequency 
vocalizations of short duration, because the pointer has not sufficient time to reach the correct position, giving 
lower measurement of sound intensity, and resulting in oversaturated recordings. In such cases, it is necessary to 
record using a level threshold distinctly below the limit of saturation. 

Many microphones have switchable built-in frequency filters {e.g., low noise filter). Although in the hand of 
experts these can be very useful, we recommend switching these off during recording of anuran vocalizations, as 
filter settings may affect frequencies that are part of the acoustic signal. 

A past problem was the irregular recording or playback of defective tapes (damaged by humidity or by worn- 
out drive belts), leading to irregular temporal representation of the recording in digitized files obtained from those 
tapes. A related phenomenon with digital recorders is the presence of small and easily overlooked switches that 
lead to accelerated or decelerated recording, which grossly distort temporal and spectral parameters. Because such 
functions may accidentally be switched on during transport, regular checking recorded sounds by playback is 
important. 

Not all cases of distinct frequency bands in a spectrogram are indicative of true harmonics. As shown above 
(Fig. 4), in high FFT resolution a spectrogram will automatically be structured in frequency bands, even if the call 
is not tonal and does not contain distinct harmonics. The presence of distinct frequency bands, not reflecting 
harmonics, is most often obvious in rapidly pulsed or strongly pulsatile calls (see Fig. 22 for examples). Such calls 
are particularly prone to result in frequency bands in spectrograms (even at low FFT resolution), sometimes 
resembling harmonics. These frequency bands are caused by high rates of emission of acoustic structures (pulses), 
and thus reflect the pulse rate (Jackson 1996; Gerhardt 1998), as empirically demonstrated early by Watkins 
(1967), who found a relationship between pulse rate and frequency band intervals and their relative energy. Also 
the poorly explored phenomenon of sidebands (Frommolt 1999) should be mentioned in this context. These are 
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frequency bands typically occurring in pairs within frequency-modulated or amplitude-modulated sounds, 
equidistant both above and below of the dominant frequency. Sidebands can be a natural phenomenon, but might 
also be caused by technical artifacts of the recording or analysis device, or by interaction with unrelated sounds. 



Time (ms) Time (ms) 


FIGURE 22. Spectrograms and corresponding oscillograms of the advertisement calls of two anuran species, exemplifying the 
presence of parallel frequency bands in the spectrogram, although at comparatively low FFT resolution (256; Planning window 
function), not representing harmonics, but resulting from a high pulse rate. Left: Section of the advertisement call of 
Elachistocleis sp., with a pulse rate of 245 pulses/second and a total call duration of 2000-3040 ms. Right: Advertisement call 
of Physalaemus albonotatus, with seven recognizable and modulated frequency bands. Its call is exceptionally fastly pulsed, 
with an approximate rate of 470 pulses/second. Figures modified and data taken from Kohler (2000). Spectrograms and 
oscillograms produced with CoolEdit Pro. 

False harmonics (an additional number of harmonics) are commonly found in spectrograms when the 
recording was (over)saturated (Fig. 23). This generally happens when the microphone is too close to the sound 
source, and the sensitivity of the microphone and recorder (recording level and filters) are not properly adjusted. 
False harmonics can be more frequent among loud and high-pitched calls. Because true harmonics are those 
formed by multiples of the fundamental frequency, false harmonics may be detected as frequency bands that are not 
exact multiples (or fractions) of the dominant frequency (because sometimes we cannot observe the fundamental 
frequency in the spectrogram). It is possible, but insufficiently explored in anurans, that some species may fdter 
lower or specific frequency bands using morphological or behavioral traits (as observed in songbirds; Greenewalt 
1968). Sometimes, the harmonics are not visible in the spectrographic display because they have low energy. The 
number of true harmonics detected in a spectrogram can be underestimated if recording levels are low. More 
importantly, the number of harmonics also depends on recording distance: because high frequencies are attenuated 
more rapidly by the environment, especially at the ground level (e.g., Kime et al. 2000), they do not propagate as 
far as low frequencies. Therefore, even at recording levels set to compensate for distance, upper harmonics will 
fade out when recordings are taken from relatively long distances. As a consequence, holding a microphone at 50 
cm to a frog (without oversaturating the recording), will detect more harmonics than recordings obtained from 
larger distances, as here demonstrated with calls of Bombina (Fig. 23). In some cases, one of the upper harmonics 
has higher sound energy than the lower harmonics, and despite stronger excess attenuation of higher frequencies, it 
might still be visible on spectrograms of long-distance recordings, while some harmonics at lower frequencies 
might not be detected. 


Conclusive remarks and future perspective 

Biologists studying amphibians apply bioacoustical methods with different purposes. In many cases, especially 
when working at the intraspecific level, high precision and statistical accuracy are of utmost importance. But frogs 
do not wait calling until complex equipment is assembled or until the rain stops and it is thus safe to take such 
equipment to the field. In times of globalization, more and more scientists—and citizen scientists—^venture into 
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tropical environments hosting numerous poorly known species of frogs. They will rarely have expensive digital 
recorders and microphones in their backpacks, but almost certainly will carry a smartphone or digital camera with 
video/audio function. When observing a rare frog calling, maybe one for which vocalizations are unknown, it will 
definitely be worth recording it (and making it available to the scientific community along with a photo of the 
specimen). Like many natural history observations, such fragmentary data can provide important pieces of 
evidence to better understand the biology of a species or hint at existence of new taxa, despite their limited 
usefulness in rigorous taxonomic, ecological or evolutionary evaluations. 
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FIGURE 23. Spectrograms and oscillograms of calls of the same individual of Bombina bombina recorded at different 
saturations of the recording levels and different recording distances (Tascam DR-05 digital recorder and a Sennheiser K6/ 
ME66 microphone; water temperature 22.1 °C; 24 May 2015 at Schorfheide-Chorin Reserve, Germany). Calls were 
successively recorded from the same individual within a short time period of ca. 30 minutes and each spectrogram thus shows a 
different call. The upper left spectrogram is from a recording with recording levels in the field deliberately set on 
oversaturation. The other three spectrograms were analyzed with equalized levels. Note that the number of harmonics is highest 
in the oversaturated recording, but also depends on recording distance, with the highest-frequency harmonics disappearing with 
increasing distance. Sounds of birds and insects are visible on the recordings that were not filtered to allow objective 
comparison. All spectrograms made with the R package Seewave (Sueur et al. 2008a), with settings: Hanning window 
function, 1024 bands resolution, overlap = 90%. 
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Anuran bioacoustics is a wide and fascinating field, and has led to important insights in behavioral ecology, 
physiology, and evolution. In the present paper, we have reviewed these aspects from a taxonomic perspective, and 
in the boxes below we emphasize the practical application of bioacoustical methods in the taxonomy of 
amphibians. We are convinced that more standardized descriptions of calls and a wider general availability of well- 
curated collections of recordings have the potential of further speeding up and improving the quality of amphibian 
taxonomy. 

To achieve this goal, we here identity a number of potentially fruitful fields of future research. In the advent of 
ever more powerful genomic methods, it will become possible to search for genes influencing call patterns. If 
integrated with morphological studies of laryngeal structures, such data will probably increase our knowledge 
about the evolution of sound signals and its associated morphological constraints. Meta-analyses of call patterns 
across multiple frog communities might aid studies on the environmental influences on particular call traits. These 
could he aided hy computer-based automated sound comparison software. Eventually, it will be worth testing 
different hypotheses about the rates of call evolution, in particular whether call traits (under sexual selection) are 
more stable within species than morphological traits (under natural selection), especially in species distributed 
across wide ecological gradients. Call variation might be a driver of ‘cryptic’ diversification, and ‘acoustic 
radiations’ will potentially become evident in species-rich taxonomic groups of frogs with large distribution areas 
and several secondary contact zones. 

Many amphibians have experienced, and are experiencing dramatic declines and extinctions (Stuart et al. 
2004; Wake & Vredenhurg 2008), and at the same time there still is a large number of undescribed species (Hanken 
1999; Meegaskumbura et al. 2002; Kohler et al. 2005a; Stuart et al. 2006; Fouquet et al. 2007; Vieites et al. 2009; 
Jansen et al. 2011; Funk et al. 2012). A complete and correct taxonomy is at the core of threat status assessments, 
and considering the high importance of call differences in delimiting and identifying frog species, further 
development of methods and concepts in this field is of importance for effective conservation planning and future 
management of amphibian diversity. 
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Boxes: A guide to bioacoustics in taxonomy 


Box I: Essentials for recording annran calls for taxonomy 

Sampling strategy: For use in taxonomy, the recording of frog calls has to meet several requirements, whereas other 
criteria, important in ethological or ecological studies, might be of minor importance. Frog bioacoustical studies come in 
different flavors. Many of our recommendations stem from our own experience in the exploration of largely unknown faunas 
in remote areas of the world, where time is limited and multiple tasks (observation, collection and preservation of specimens, 
collection of ecological data, as well as call recordings) are often carried out under suboptimal field conditions. This typically 
leads also to suboptimal bioacoustical data which nevertheless are often taxonomically informative in an integrative 
framework. 

On the contrary, when planning a specific study to gather bioacoustical data for clarifying the taxonomy of a particular 
group of anurans, design your sampling and fieldwork beforehand. If data exist pointing to the presence of different lineages 
or candidate species whose status is to be tested, include various localities (at least two) per lineage. Include localities both 
from the core area of each lineage and from the presumed contact or hybrid zone. Make sure you visit the different study sites 
in the peak activity of the frogs in order to obtain recordings of highly motivated males (see below). If possible, take sufficient 
time to record 10-20 males per site. Use state-of-the-art equipment (Box II), and wherever possible use the same equipment 
and settings at all study sites. 

Social context / call type: Keep in mind that the sound production with highest taxonomic relevance in anurans is the 
species-specific advertisement call of males. This is the call emitted by a male to attract a conspecific female (and sometimes 
with added signalling function to other males). The social context in which calls are emitted constitutes an important factor to 
evaluate the taxonomic relevance of possible bioacoustical differences. As several call types with different function have been 
documented in anurans, you have to be as sure as possible that you are recording calls with relevance for taxonomy. For many 
frog species, aggressive calls have been documented which occur often when there is a dense aggregation of conspecific 
males at one calling site. These calls with aggressive function are triggered by the presence of other conspecific males calling 
at close distances. Therefore, dense choruses of males of the same species have the disadvantage of the difficulty to locate a 
single calling male for recording, overlap of calls from many males, and possible predominant presence of a call type different 
from the advertisement call. 

In our field experience, a call can likely be considered to represent an advertisement call, if it is emitted in a 
stereotyped manner outside of very dense male aggregations. This view might be further supported if you recognize the same 
call type being emitted by several males (these not being close neighbors possibly competing for the same calling perch). 

Motivation: Individual calling motivation is possibly a factor commonly underestimated in practice and in the literature. 
Depending on different biotic and abiotic factors, calling motivation of an individual frog may vary greatly. These factors 
include for example temperature, precipitation, wind, moonlight, presence of conspecifics, presence of potential predators, 
individual hormonal stage, disturbance and others. Furthermore, in nocturnal frogs, starting to call at dusk or shortly after, it is 
commonly observed that early calls are quite different from regular calls emitted later, at full motivation. 

Be sure that the recorded male is calling on a regular basis, and that calls emitted are repeated frequently and are similar 
to the calls emitted by other males of the same population. 


98 ■ Zootaxa A25\{\) © 2017 Magnolia Press 


KOHLER ETAL. 








Voucher specimen: Collecting the recorded individual as a scientific voucher specimen is one of the most crucial 
requirements for using bioacoustics in taxonomy. The perfect dataset would be a good call recording of a particular frog, the 
same individual collected as voucher specimen and a tissue sample of that specimen for genetic analyses. If you have no 
permits to collect the specimen, make sure to take at least a sample (buccal swab or toe clip, if allowed by local legislation) for 
subsequent molecular analysis (DNA barcoding). However, getting such data poses a practical difficulty. In many regions, 
simultaneous calling of several different frog species at the same site is a very common phenomenon. Therefore, there are 
chances that the voucher collected was not the actual source of the call recorded. 

Make sure to get as close as possible to the calling individual when recording, without disturbing it, to see that particular 
individual calling. For this, look for movement of the vocal sac or of the flanks during call emission. Try to observe (if 
applicable and possible) which unit of the call corresponds to one expiration. If you were too close and you disturbed the frog, 
step back and wait for calling activity to resume. After a recording is obtained, try to catch the individual recorded. Take note 
of all information related to the recording and the voucher specimen (preferably by adding the field number of the voucher 
specimen to the recording and adding the ID of the recording file to your field notes). 

Sometimes, calls can be heard and properly recorded, and the specimens putatively emitting the sounds can be located, 
but it turns out to be impossible to see the actual vocal sac movement that would confirm this individual as the one emitting 
the calls (e.g., because It is extremely shy, or calling from very dense vegetation, hiding places, or high in the canopy). In such 
cases, and provided that the species has a small body-size, a last resource to confirm that the recordings correspond to the 
collected specimens is to catch some individuals and keep them alive in a plastic bag. Leave this bag preferably close to the 
original collection site in the wild, and wait until they start calling from fhe bag again. Even if you do not obtain good 
recordings from this specimen, you will have a confirmation that this individual emits calls similar to those recorded from 
free-ranging specimens. 

Success rates in difficult cases can be improved by playback of randomly recorded calls of the same species (which do 
not have to meet all mentioned criteria, as only used for playback purposes). If recording in the field proves impossible, for 
instance in frogs with calls of very low intensity or calling under water, recordings can be achieved by keeping frogs in 
terraria and recording their calls from these. Such recordings may however slightly suffer from echo effects and other 
artifacts, and often, captive frogs will not call with high motivation. Furthermore, other call types might be stimutated: 
usually, in species with amplexus, the emission of a release call (with potential value in taxonomy) can be stimulated by 
pressing gently the flanks of the frog (simulating an amplexus). In such scenarios when recordings were obtained from 
constrained specimens, researchers should report a detailed description of the procedures, considering all potential physical or 
behavioral artifacts for the resulting recordings. 

If there is no permission to collect, take as many detailed pictures of the living voucher as necessary to identify possible 
diagnostic characters in external morphology later, measure its snout-vent length and, if allowed, take a small tissue sample or 
buccal swab for DNA barcoding. 

Deposit your recordings in sound archives. These files, besides being useful for future research, are also a testimony of 
the presence of the species in a particular place and time. 

Recording quality: A perfect call recording is the best data basis for detailed call analyses. In practice, a nearly perfect 
or even good recording is usually difficult to obtain. Many factors may alter the quality of a recording like loud background 
noise (traffic, rain, fast-flowing river, wind, flying insects, sound emissions of other animals, etc.), great distance to calling 
frog, and less suitable recording equipment. In practice, it is recommendable not to refrain from making a recording even 
under inappropriate circumstances. If you are far away from the calling frog, make a recording, and subsequently try to get 
closer and make a better one, then try to get even closer and so on. In extreme cases when the only available recording device 
is your mobile phone, go ahead and use it—it is better than nothing (but see Box II for restrictions of suboptimal equipment). 


Box II: Recommendations for recording calls 

Recording equipment: Preferably use a good digital (solid-state) recording device able to record in an uncompressed 
file format (see below). In field practice, tape recorders frequently suffer from varying tape speed or other mechanical 
problems, particularly under humid/tropical conditions. They also produce sound from integrated engines, which 
unpreventably become part of the recording. Digital recorders, many of which were designed to record high quality music 
files, have several advantages over tape recorders, and most of today’s built-in microphones are of good to very good quality. 
Although the use of external microphones is here still recommended as best practice, these devices have become less essential 
because of the good quality of built-in microphones and the lack of motor noise produced by the device. Advantages of 
external directional microphones include that they typically capture less environmental noise and can be used at larger 
distances, which is especially helpful when recording frogs that are easily disturbed when approaching too close, or those 
calling from the tree canopy. Protection of the microphone by foam or feather windscreens can be of paramount importance, 
especially in open windy areas because pressure changes can severely affect recordings. 

Apart from the recorder itself, it is highly recommended to take a thermometer, and ideally a hygrometer, to the recording 
site. Take temperature data of air and substrate, and of water, if the frog calls from a water body or its edge. Ideally also 
measure the body temperature of each frog, for example by capturing the frog and measuring cloacal temperature with a fast¬ 
reacting thermocouple, or by measuring body surface temperature with an infrared laser thermometer. 
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Recorder settings: As you may encounter calling frogs right in front of your ‘door’, it is important to check settings and 
battery power before you go out. Starting to check and adjust these in front of a calling frog may possibly result in 
considerable delay, disturbance of the frog and, eventually, lack of any recording. 

Use the best quality settings available on your device. This means, if a digital recorder is used, the highest sampling rate, 
highest resolution and uncompressed fde format (e.g., WAV, AIFF, PCM, LPCM, BWF). If you have a stereo recording 
device, switch to mono if possible, as the stereo signal does not provide any further valuable information. If there are separate 
settings for the microphone (no matter if built-in or of external type), preferably select a directional mode to suspend as many 
undesired sounds from the recording as possible, and switch off preset frequency filters. During recording make sure that the 
recording level is adjusted properly, namely avoiding an exaggerated input level which might result in several artifacts such 
as distortion and false harmonics. A slightly too low recording level is less problematic than a level set too high. 

Recording time: Try to record a reasonably long section of the vocalization. The recorded section should preferably 
contain at least 10 regular calls, but the more calls are recorded the more choices are available for analyses. Try to record 
several consecutive calls emitted by the same male without pausing your recorder, even if intervals between calls are rather 
long. 

Recording distance: In practical field work, it is generally recommendable to use a recording distance between 
microphone and frog ranging from 0.5 to 1.5 m. However, reality mostly will demand a compromise, as getting too close to 
the calling frog may disturb it and result in ceased calling. In cases of highly motivated calling males, it might be possible to 
get very close to the caller. However, such a very short distance to the calling individual may likely result in unwarranted 
near-field effects which may alter the natural call. Near-field effects can be expected to be larger in very loud calls (sound 
levels of frog calls can reach 120 dB measured at 0.5 m distance) or calls with considerable sound energy at the edges of the 
equipment’s frequency range. Under unproblematic conditions, we recommend a recording distance between 0.5 and 1.5 m, 
but researchers should always compare the recorded sound (via headphones) with the natural sound and adjust recording 
distance if sound alteration is recognized. Check the recording level settings again after adjustment of the distance to the 
calling frog. 

Recording procedure: Either before you start the recording of the frog call, or after you obtained a sufficient recording, 
provide some basic data by speaking on the recording. These data should at least include your name, date, time, exact locality 
information, and taxonomic information of the frog recorded as far as known (family, genus, species group, species complex); 
and as appropriate and available, also information on temperature, humidity, habitat, perch, perch height, weather conditions 
and social context. Make sure from the spoken information to be able to distinguish recordings from different individuals later 
{e.g., using a field number system). If observed, also add information on individual calling behavior, vocal sac type and other 
observations that may later be of importance. This way, basic information is unequivocally connected to your recording file. 
You may note these data later in a field book, but leave them as audio information with the recording to assist identification of 
files later. 

If available, use headphones during recording to control for quality and unwanted noise. In case your equipment features 
a microphone with some directional character, try to find a position / direction / distance in favor of a clear call sound and 
only little background noise. 

If you are in front of a dense chorus (of a single species or several species), try to select an isolated caller for recording, 
so that later your recordings can be assigned to a single individual. 

If possible, observe the frog calling during recording. Check for synchronicity of sound and observations such as vocal 
sac inflation. If call emission and vocal sac movement are not in agreement, there is probably another individual calling close 
by. Red light torches or headlamps might be usefiil to observe the frogs during recording, as they disturb less than white light. 

Once you are satisfied with your recording save it as a separate file. Put your recorder, microphone and headphones in a 
safe place, such as a dry bag or container, and try to catch the recorded frog. 
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Box Ill: Recommendations for advertisement call analyses 

It is highly preferable that call analyses are performed by the same person who recorded the calls in the field. There are 
several good reasons for it, as the recordist may remember the recording situation in the field. This is an important 
precondition because circumstances unknown to the analyzing person may result in misinterpretation of sounds on the 
recording. The recordist may easily distinguish among different call types in view of the social context, calls from other 
anuran species which were calling simultaneously, background noise, number of calling males of the recorded species, etc. If 
this is not possible, the analyzing person should demand as many details as possible from the recordist, particularly if many 
different sounds are audible on the recording. It obviously is essential to select the correct sounds for analysis instead of 
analyzing artifacts or sounds from unwanted sources. 

In order to estimate call variation, we recommend, depending on the complexity of call structure, to analyze a minimum 
of 10 calls of each individual, but more are highly preferable, particularly if the analysis reveals some notably variable call 
parameters. If recordings of more than one individual are available, include all of these in the analysis. Be transparent about 
possible shortcomings of your description by always clearly stating how many calls from how many individuals you are using 
for the description. The number of calls to be analyzed depends on the purpose of the study. In comparisons among closely 
related species with similar calls, a minimum of 10 individuals, if possible from various locations should be included. For 
statistical comparison make sure to avoid pseudoreplication: first calculate a mean value for each specimen, and then use 
every specimen as a data point in analyses at the population level. 

Prior to start an analysis: Assuming that you have a digital file on your computer ready for analysis, it is important to 
listen to the recording one time completely. Apart from listening to some basic information provided on the recording by the 
recordist (supposed species recorded, date, locality, habitat, temperature, other species on recording, etc.), which should be 
noted now once again, you may detect artifacts or recording problems at this stage. First, make sure that the technical transfer 
to your computer did not alter the recording. Is the overall sound of the recording clear and without unusual shifts in 
frequency? Frequency shifts are sometimes a problem of older tape recordings as the respective recorders were running at 
varying speed under certain circumstances. If it is easy to hear the call of the target species clearly, undistorted and well- 
separated from other noises, the conditions for an analysis are probably good. If there is a lot of background noise, calls from 
non-target species, overlapping conspecific calls, etc., the preparation of the analysis requires additional care. 

Flave a close look at the spectrogram of the entire recording while listening. Is the target call clearly visible as being the 
sound with highest amplitude compared to other sounds? Is it well-separated or is there overlap with other noises or calls of 
other males? In recordings with many frogs calling plus some other sounds, it might not be an easy task to identify the 
structure to be analyzed. Try to find a section in the recording where the target call apparently is least affected by interfering 
sounds. Zoom-in to this section to become aware of the structure of the target call and listen to this section again for control. 
Then, search for this structure in other, less clear sections of the recording. If this turns out to be difficult, you may adjust the 
display settings of the spectrogram by selecting grayscale (if you are in color mode) for amplitude and lower the sensitivity. 
Depending on the setting level, this will result in display of the sounds with highest energy only. Given that the microphone 
was relatively close to the calling target male, you will probably recognize its calls with setting low display sensitivity much 
more easily. This procedure does not alter the recording, as it simply changes the display of sounds in relation to their 
recording amplitude, and assists in identifying the target structures. 

Analysis of temporal parameters: Use only those calls for analysis that look and sound clear and ‘natural’. 

Having selected a clear and well-separated call, an oscillogram displaying the relative amplitude over time will provide a 
clear picture of the temporal structure of a call. Make sure that amplitude alignment in the oscillogram appears ‘natural’ and 
within the frame. If amplitude peaks are flattened at top and bottom, these were probably cut due to acoustic saturation (e.g ., 
from an exaggerated recording level, alteration by software). Apart from providing information on number of notes per call, 
number of pulses per note and other, the temporal parameters of a call are easily measured by selecting frames with your 
cursor in such an oscillogram. Make sure to position your cursor as exactly as possible. Zooming-in to certain structure should 
facilitate exactness. Listen to your selection again before measurement to verify that all structures shown belong to the call. 
This way, measurements of basic call parameters (call duration, note duration, inter-note interval, etc.) can be obtained. When 
measuring repetition rates, keep in mind how these are calculated and that you always have to measure a certain number of 
call structures plus the exactly same number of respective intervals. 

In lower quality recordings, the oscillogram may not be suitable for temporal measurements, as the target structure shows 
overlap with other sounds, such as background noise, insect sounds, and other calling frog males. In such cases, it might be 
the only option to take rough measurements of temporal parameters from the spectrogram. To do so, select a low FFT 
frequency resolution (< 400) for display in your spectrogram, as this automatically increases time resolution. Search for the 
identified structures and measure these as you would do in the oscillogram. As these measurements may be less exact 
compared to oscillogram measurements, detail this procedure in your methods paragraph. 

Older tape recordings may suffer from flutter and varying speed of the recorder. If you recognize unusual flutter by 
frequency shifts in the recording, it becomes almost useless for analysis as temporal parameters were significantly altered. 
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Analysis of spectral parameters: Spectral traits are visualized with spectrograms, but spectral information should not 
be extracted from these visualizations (see Zollinger et al. 2012). Instead, power spectra / spectral analysis tools should be 
used for assessing spectral variables. For frequency analysis, choose a high FFT resolution (> 500). For identifying the 
frequency of highest energy within the call, select a single call only and apply the frequency analysis tool to check for the 
most dominant energy peak. Repeat this procedure with other calls and note respective values as these will vary more or less. 
If the call is composed of several notes, repeat this procedure by selecting single notes only. In calls with distinctly 
recognizable harmonics, state to which harmonic the dominant frequency corresponds. In some cases, initial notes may have 
their energy peak at a different frequency compared to subsequent notes, or there is an overall frequency modulation within 
the call from its beginning to the end, or notes themselves are heavily frequency modulated. In the latter case, measure the 
dominant frequency at the beginning and at the end of the note, separately. Make sure to record the respective frequency 
values and temporal position to enable a proper description of any kind of frequency modulation. In rare cases, several peaks 
of similar intensity might be present, especially in calls with harmonics and, in such cases, the frequency for all of these peaks 
should be measured and reported. Apart from the dominant frequency, the bandwidth is of importance. In clean recordings, 
we recommend calculating 90% bandwidth, whereas in recordings with important background noise, only the approximate 
prevalent bandwidth can be estimated using the frequency analyzing tool, or directly from the spectrogram. 

Filtering: In many cases, frog calls in the field are recorded simultaneously with other sounds. Loud sounds produced by 
cicadas and other insects can be particularly disturbing, but they often call at much higher frequencies than frogs. To filter 
certain sections of the recording may facilitate analysis, as it removes unwanted sounds and will thus result in more clear 
oscillograms of the call you are interested in. Filtering always has to be done in view of the spectrogram. You may apply a 
lowpass or highpass filter, or both. Make sure that the call to be analyzed is not affected by your filter settings. Choose your 
filter frequency settings with at least 0.5 klTz distance to the minimum respectively maximum frequency displayed by the call 
itself After having applied the filter, listen to the filtered section again. The call should sound exactly like prior to filtering. 
Always mention any used filter settings in the methods. 

Selective amplification: In some recordings with partly soft and weak signals, it might be helpful for analysis to amplify 
a specific sound of the recording prior to analysis. This can be done without, prior to, or after selective filtering. Alternatively, 
selective negative amplification may reduce disturbance by unwanted sounds. Selective amplification, positive or negative, 
should only be done after unequivocal identification of the target sounds. If done properly, such process may result in more 
clear and analysable structures. If applied for graphical representation, then we recommend the selectively amplified sound 
should be shown in a separate figure (i.e., do not mix selectively amplified with non-amplified sounds in one spectrogram). 

Another option provided by several programs is to ‘normalize’ the amplitude, bringing the maximum energy peaks to a 
certain preset value. Equally, this can be done without, prior to, or after selective filtering. In any case, check the settings for 
this procedure carefully and make sure by wafching and listening to the structures that the recording is not artificially altered 
by this action. 

Automatic noise reduction: Most programs used for sound analysis do provide the option for automatic noise reduction. 
The application of such usually reduces white noise, but to our experience also affects the call itself When listening to noise 
reduced recordings, frog calls somehow sounded different compared to the original recording. We therefore recommend not 
using automatic noise reduction. 


Box IV: Recommendations for advertisement call descriptions 

Methods: The first important part of a call description is to describe the methods applied in every detail. Apart from the 
software and settings used, also describe possible filter application and procedures of measurements. If appropriate, illustrate 
measurements on an accessory figure of the oscillogram and spectrogram. Clearly state which terminology you are using in 
your description and the number of calls and number of individuals you analyzed. 

Context and circnmstances: Start your call description with some information on circumstances during recording (e.g., 
date, time, locality, geographic coordinates, habitat, temperature, weather, social context). Mention if recorded calls were 
emitted continuously at more or less regular intervals, arguing for an apparently regular calling motivation (or mention the 
opposite if observed). Honestly mention all technical and biological restrictions of the recording (missing data, lack of direct 
observation, shortcomings of technical equipment). 

General call properties: Prior to providing detailed numerical parameters, it is recommended to describe the general 
properties of a call (e.g, whistle, pulsed multinote call, series of pulse groups; or using the categories of Beeman 1998). Do 
calls consist of single notes or multiple notes, and are there different note types? Are calls repeated ‘endlessly’ after regular 
intervals, or are call series of defined numbers of calls emitted? In addition, some detailed characteristics should be mentioned 
which are not adequately described by numerical parameters alone. These details may include amplitude modulation within 
calls or notes (e.g, ascending, descending), frequency modulation within calls or notes (e.g, upward sweep, terminal drop in 
frequency), and differences among certain notes within a call (e g, initial note longer, followed by shorter subsequent notes). 
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Terminology: Always define the acoustic units call and note and mention the rationale you are using for defining a note 
and a call. If applicable, mention if you follow a call-centered or note-centered terminology, as defined herein (Fig. 7). Calls 
might consist of a single note only. In this case, it should be clearly stated. 

Always adhere strictly to the hierarchy of terms (Fig. 6). Categories other than call, note and pulse might be used in a 
flexible way, but always in the hierarchical order. Avoid using other terms than those recommended here (Fig. 6), such as 
those often used for bird vocalizations (song, syllable, dialect, and so forth). 

Comparability of calls largely depends on conformity of the terminology used in descriptions. If someone by default or 
by mistake categorizes a call as a note or vice versa, the respective call description will be barely comparable with those using 
a different definition. Thus, obviously in taxonomy there is the need to use a comparable terminology. Both the call-centered 
and the note-centered approach as defined herein have their advantages and disadvantages. Rather than strictly applying any 
such convention, the essential for taxonomic comparison is to apply the same term to homologous entities when comparing 
the vocalizations of two species. When comparing new recordings with the literature, be aware that different researchers 
might have used different terminologies. 

Always include an explicit statement on amplitude modulations (pulses). These are either recognizable or not; if they are, 
then they should be further described (pulse number, pulse rate, pulse duration, modulation depth, etc.) as appropriate. 
Especially in calls with complex structure (e.g., different note types) it can be useful to illustrate the terminology used in a 
spectrogram/oscillogram figure, with arrows and lines marking the different subunits as used in the description. 

Numerical parameters: Provide all numerical (quantitative) parameters in the same manner. For example, for 
descriptive call properties, use range (#### - ####) followed by mean ± one standard deviation (#### ± ####). Make sure you 
provide the values for all parameters characterizing a call and do not skip any. Always use the same units of measurement for 
time and frequency information, respectively. In comparative descriptions, it probably makes sense to provide comparative 
values in a table. Although not yet usually done, we recommend as future best practice to prepare and publish supplementary 
tables with original measurements of calls, distinguishing call measurements from different individuals. Provide the sample 
size (e.g., number of calls, notes, pulses, intervals) used for calculating average values and variance of each acoustic 
parameter (see as well Box III). 

Graphic presentation of calls: The graphic presentation of a call constitutes an important part of every call description, 
particularly when directly comparing calls. Select a recording section of best quality and low interference with unwanted 
sounds for presentation. Such graphs should always encompass a spectrogram and an oscillogram of the identical section of 
the recording, hence both should be presented at the same time scale. Select the time scale for presentation to display the main 
call characters of that particular species clearly. That might be one single call, a short series of calls in case of single note calls 
or, preferably, a composite figure illustrating both. In calls with a more complex structure, it might be desirable to provide 
details of selected sections by presenting them in additional figures, using different time scales. 

The spectrogram should mainly provide information on the frequency distribution of sounds over time. However, as it is 
the most complete presentation of a call, including information on both frequency and amplitude distribution over time, most 
informative results are achieved by choosing some intermediate settings for the presentation, such as a FFT window width not 
exeptionally high (which would result in low time resolution and may largely mask the temporal structure of the call), nor 
exceptionally low (which would result in a lack of information on frequency distribution). In practice, best results are 
obtained by selecting FFT widths of 256 or 512 for the spectrogram. The use of colors versus grayscale presentation for sound 
energy in spectrograms is a matter of taste. In both cases, make sure that settings are chosen aiming at an informative and 
clear presentation of the general call characteristics. Try to avoid the presentation of unwanted sounds with similar sound 
energy (see filtering and selective amplification section above), which will hamper the recognition of call structures for the 
viewer. Almost all programs offer the option for display settings for the spectrogram by adjusting sensitivity. Given that the 
sound of the call should be the most energetic signal in the selected section, you can easily improve clarity of the presentation 
by lowering the sensitivity, which will result in blinding out unwanted sounds of lower energy and thus a clear outline of the 
call structure. A usual scaling for the frequency presentation in a spectrogram for frog calls is 0 to 10 kHz. However, as 
current research revealed frog calls with a rather high frequency spectrum up to ultrasound, in some cases it can be useful 
choosing a scaling in accordance with the frequency response of your recording equipment (given that settings for sampling 
were chosen accordingly). 

The corresponding oscillogram is aimed at providing information on amplitude modulation. It is a display of relative (not 
absolute) amplitude over time. It is important as it may clearly reveal species-specific differences in call structure. Its 
presentation should thus be as clear as possible. Make sure that highest peaks of energy are not cropped in the graph, but on 
the other side are not too low. You may adjust the presentation by amplifiying or normalizing the amplitude (see above). 
Background noise and unwanted overlapping sounds may strongly mask the call structure. In such cases, consider filtering 
(see above). 

When comparing calls for taxonomy purposes, provide all graphical presentations of calls in identical time scales. This 
way, differences become immediately evident visually. Independent of the scaling provided by the analysis software, the time 
scale in a printed presentation always should start at zero and end with the value in agreement with the time frame chosen. We 
suggest choosing a time frame with a 'rounded' value, such as 0-0.5 s, 0-1 s, 0-10 s, or similar. In many cases, the numbers at 
axes provided by the available programs are not suited for reproduction in print, as they are too small in relation to the whole 
figure, or provide an unnecessary number of decimals. In such cases, you should modify the original graph with suitable 
editing software of your choice fo meet the requirements of the respective journal and to ensure readability. It is important to 
appropriately label both axes (time, relative amplitude, frequency) and mention the respective units of measurement. 
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APPENDIX 1. Standard Operating Protocols for software nse in describing frog calls 

As reviewed in detail in this paper, species-specific, advertisement calls of anurans (frogs and toads) are a powerful behavioural 
prezygotic isolating factor, and their usefulness in the recognition of distinct taxa in conjunction with morphological characters 
has been repeatedly acknowledged over time, even if exceptions exist. 

Call analyses have become increasingly popular in descriptions of anuran species, together with descriptions of larvae and 
genetic data, and many call descriptions appeared as stand-alone papers over the recent years. Since anuran taxonomists are not 
always experts in bioacoustics, some bioacoustical analyses unfortunately lack the necessary standardization for reliable 
diagnoses, making comparisons between congeneric taxa difficult and overly time consuming, if not simply impossible, 
especially when original recordings are not available. Comparisons are also hampered by different methods applied in different 
studies or by the lack of information about the analytical procedures. Although in-depth analyses are not always necessary for 
comparisons in a taxonomic framework {i.e., in new species descriptions), acquiring baseline information on some standard call 
parameters is advisable in all cases. Kok & Kalamandeen (2008) provided a brief introduction to basic call analysis, but to our 
knowledge no standardized method has ever been proposed with the aim of (1) clearly listing the necessary data useful in 
species comparisons; and (2) explaining, step by step, how to gather and illustrate these data to non-specialists in the field of 
bioacoustics. We feel that such standardized methods would make comparisons easier, faster, and first and foremost more 
accurate. 

In this appendix we propose a series of hands-on protocols that should provide a simple, straightforward, and relatively 
fast, step-by-step procedure for bioacoustical analyses to be used in anuran species descriptions. We explain with screenshots 
how to apply the respective software, and reiterate recommendations of how important call structures should be defined, named 
and illustrated to accompany call descriptions. More general hands-on information is also found in the boxes provided along 
the main text, which, we recommend, should be consulted prior to the next steps. 

Our main protocol is based on the software Raven (Charif et al. 2010). We provide the rationale for this choice below. In 
addition, we provide a novel, very simple script that should facilite the use of the R-based software module seewave to produce 
high-quality spectrograms and oscillograms. 


APPENDIX lA: How to apply software in analysis of anuran calls for taxonomy: a step-by-step guide 
using Raven 

In the main paper we have reviewed the most important software currently available for sound recordings, with their pros and 
contras. A standardized method for sound descriptions requires a software that (1) runs on multiple platforms (Linux, 
Macintosh, Windows); (2) can acquire, visualize, measure, analyze and illustrate calls; (3) is under active development; and (4) 
is easy to use with an intuitive interface. Several programs mentioned above meet these criteria, but for analysis of biological 
sounds. Raven (Charif et al. 2010) is probably one of the most commonly used. Although Raven Pro is not freeware, its price is 
relatively accessible, and a free version (Raven Lite, currently version 2.0) exists which can be used for the most basic analyses. 
Since Raven meets all the criteria we deem important for sound analyses, we decided to use Raven Pro 1.4 for this step-by-step 
guide. We tried to keep this tutorial as easy and straightforward as possible. Raven contains many advanced options that are not 
necessary in call descriptions in a taxonomic framework. Comprehensive information about Raven can be found in the 
program's user manual (Charif et al. 2010). After checking the current beta version of the next version of the software, we 
anticipate that this protocol will in principle be usable also with Raven Pro 1.5. 


A. Getting started 
A.l. Acquiring input 

This will mostly depend on the type of recorder you used to record the call. 

(i) If the call has been recorded in uncompressed format on a memory card using a solid-state recorder (highly 
recommended, see earlier in the text), simply open the .wav (or .aif) file using Menu Bar—>File—>Open Sound Files 
(you can alternatively use the corresponding icon in the Raven Tool Bars, or drag and drop the file on Raven’s 
icon). If a new window named “Configure New Sound Window” opens, leave all values as default and click OK. 
Skip the next part and go immediatey to A.2. 

(ii) If the call has been recorded on a different audio medium than a memory card, such as a cassette tape or a digital 
audio tape (DAT) you will first have to import the call in your computer in a format suitable to be used in Raven 
(i.e., uncompressed formats such as WAV or AIFF). To do so, connect your recording device to your computer 
(usually through the line input, but this depends on your computer model and you may need to check the computer 
manual for that) and open a new Recorder window in Raven (Menu Bar—>File^New—>Recorder). 

(iii) Select “File” in “Record To”. 

(iv) In the Input menu find the entry line of your recording device, which should appear in “Device”. Select a sample 
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rate of 44,100 Hz; select Left or Right channel; and select “16-bit signed PCM” in Sample Format. 

(v) Values in the Display menu can be left as default. 

(vi) In the File Format menu select AIFF or WAV as File Format; select 16 bits as Sample Size; and adjust your file size 
according to the length of your recording. Raven will automatically stop importing when the selected size is 
reached, so better to select a longer time than necessary. 

(vii) In the File Name menu, go to Directory and select the folder where you want the file to be saved on your computer. 
Click OK. 

(viii) A Recorder window opens. Click on the red button at the bottom left of the window (Record to Disk) and start 
playing the call on your recording device. A signal and frequencies should appear in the window while your 
recording is playing. Once the call or the sequence of your choice has been imported in Raven, click on the red 
square at the bottom left of the window (Stop Recording to Disk). Your call is now saved in the folder selected in 
the File Name menu (see above). 

(ix) Open the .wav (or .aif) file using Menu Bar^File^Open Sound Files or drag and drop the file on Raven’s icon. If 
a new window named “Configure New Sound Window” opens, leave all values as default and click OK. 

A.2. Acquisition of the data 

Once your sound file is open you need to zoom in the call sequence to detect the call structure, which is the very first step 
of your analysis. Figure SI shows a continuous 6:20 min recording of AUobates amissibilis as recorded in the field and 
imported in Raven. 



FIGURE SI. Field recording of AUobates amissibilis. The sequence alone does not allow discriminating between calls and 
notes. Zooming in the sequence is necessary to determine the call / note structure. Red highlights the Menu Bar, blue highlights 
the Tool Bars, green highlights the Side Panel, yellow highlights the Raven Desktop. 

In addition to the Menu Bar (red in Fig. SI), the Tool Bars (blue in Fig. SI), and the Side Panel (green in Fig. SI), Raven 
provides a Desktop (yellow in Fig. SI), with two views: a Waveform (shape of the signal, also called oscillogram; upper view 
in Desktop) and a Spectrogram (spectrum of frequencies; lower view in Desktop). Raven allows you to zoom along the x- and 
y-axes of these views by using the “-I-” and on the right lower corner of the Raven Desktop (Fig. S2). Whereas zooming 
along the x-axis works simultaneously on both views, zooming along the y-axis works only on the selected view. To select a 
view, click on its name in the Side Panel (in Views), or simply click in the view, preferably on the left side of the x-axis, 
otherwise you will create a selection border in the view, which is not necessary at this stage (to remove such a selection border, 
go to Menu Bar^View—>Clear All Selection Borders). If you wish to come back to the initial view, click on |—| (right side of 
and below on the x- and y-axes, respectively). At this stage your analysis can start. 
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FIGURE S2. Lower right corner of the Raven Desktop window showing the tabs used to zoom in and out of the sequence. 

Before starting any analysis, make sure that the recording level of your recording is adequate. The amplitude in the 
oscillogram must not be exceeding the maximum values of the legend in its relevant part; if it does, the 
oversaturation might be due to exaggerated input leveling during the recording, or it might have occurred while 
digitizing your original recording from a tape or similar. If the original recording is oversaturated, there is little to 
do except trying to choose another recording for analysis. In oversaturated recordings, frequency information will 
not be reliable, while temporal information can still be extracted with some reliability. 

If the call amplitude is too low (or too high), Raven allows you to increase or reduce the amplitude of the complete call, or 
of the active selection only. To do so, go to Menu Bar^Edit—>Amplify. A dialog box pops up, indicating the options of 
amplifying the entire call (Entire Sound) or the active selection only (Active Selection). Raven offers four methods of 
amplification. We suggest using “multiply by factor”, which allows you to specify the factor by which the program will 
multiply the call (note that amplifying by a factor between 0 and 1 will reduce the amplitude, amplifying by 0 will silence fhe 
call). See Raven manual (Charif et al. 2010) for more informafion abouf other amplification methods. Mention any 
amplification procedures in the material and methods section of the call description. 

To reduce the effect of background noise (sometimes causing heavy background color in the spectrogram) you can use the 
Clipping Level parameter, which allows you to specify a “noise floor” below which any amplitude value is altered (Charif et al. 
2010). To do so, go to Menu Bar—>View New—>Spectrogram View. A window named “Configure New Spectrogram View” 
opens; in “Clipping”, tick “Clip” and select the value below which any amplitude value must be altered (in “Values Below”). 
Note that if you set this too high, portions of the signal will not be visible anymore, so play with different clipping levels in 
order to produce a satisfactory spectrogram. As you will anyway modify fhe appearance of the spectrogram, you must mention 
in your material and methods whether you used this parameter. 


B. Analysis 
B.l. Call structure 

The first step of the analysis should consist in the precise description of the call structure (i.e., its appearance on the 
oscillogram). Identify and menfion the general type of sound (categories of Beeman 1998) and whether you chose for the 
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description a call-centered or note-centered terminology. The following instructions largely follow a note-centered approach 
but can easily be adapted to comply with a call-centered approach. You should mention if any of the characteristics listed below 
and exemplified in Fig. S3 apply to the call (see main text for definitions of these terms and when to use them). 

(1) Is there a single note per call or are there multiple notes per call? 

(2) Is the call a simple call (all notes of one note type) or complex call (i.e., containing different note types)? 

(3) Are notes pulsed, pulsatile (consisting of poorly distinguishable pulses) or unpulsed? 

And: 

(4) How are notes arranged in the advertisement call? 

(5) Is the call emitted continuously or not? 

Check the call structure as follows (Figs. S4—S5): 

(i) Zoom in the call along the x-axis (and y-axis if necessary) using the “+” as described above until you are able to 
distinguish notes. You should be able to see if the call is formed by a single note, or if it contains multiple notes. 
Playing (listening to) the call will help determining this. Click on the button “Play” or “Scrolling Play” in the Tool 
Bar (upper right of the window) to listen to the sequence and associate sounds with the different structures in the 
spectrogram. 

(ii) Check how notes are distributed in the call. Are they evenly spaced? Do some of them cluster? 

(iii) Check if the call is emitted continuously or if intercall (silent) intervals are visible. Continuous calls are often 
composed of a single note. 

(iv) Now, zoom closer and focus on a single note. By zooming in over the call you may loose the first note. Use the 
cursor (below the Spectrogram view) to scroll the view laterally, if necessary. Is there any change in the amplitude 
modulation? A note with drastic change(s) in the amplitude modulation is called pulsed (see main text). Move the 
cursor and check all the notes within the call to see if modulation varies among notes. 



FIGURE S3. Examples of some common call types. Upper left, call composed of a single tonal note (Anomaloglossus 
roraima, 3-s sequence); upper right, call composed of multiple notes of only one note type (here 8 notes, Allobates amissibilis, 
3-s sequence); lower left, complex call composed of notes of different note types (Osteocephalus leprieurii, 3-s sequence); 
lower right, single pulsed note with amplitude modulation within pulses (Osteocephalus taurimis, ca. 2-s sequence). 
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FIGURE S4. Zooming along the x-axis to detect call structure, see text for explanation. 
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FIGURE S5. Zooming along the x-axis to detect note structure, see text for explanation. 


B.2. Temporal structure 

Temporal variables should always be measured on the oscillogram (waveform) because the spectrogram conies with a 
time/frequency trade off (Charif et al. 2010; see the main text). Overlapping calls (in choruses for instance), calls recorded over 
intense background noise, and poor-quality recordings in general should be avoided when measuring temporal variables. 
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If only poor-quality recordings are available, or if no alternative exists for recordings containing intense 
background noise or overlapping calls, a call description might still be useful and warranted, but these restrictions 
need to be clearly stated and reflected in the description parameters (e.g., by adding "ca." to any temporal 
measurements). In general, adjust the precision of information in your description to the quality of the recording. 
For poor-quality recordings, it might not be useful to give temporal measurements with a precision of 1 or 10 ms (= 
0.01/0.001 s), and it might be more honest to give parameters with a "ca. "precision of 20 or even 100 ms (= 0.02 
/ 0.1 s). Avoid pseudoprecision that is not warranted by the quality of the underlying data! 


Temporal variables and related parameters measureable on the oscillogram should include: 

(1) Number of notes per call (a call being defined as a series of notes emitted in groups between longer silent 
intervals); 

(2) Note duration (beginning of the note to the end of the note); 

(3) Duration of the silent interval between notes (internote interval; end of a note to beginning of the next note); 

(4) Note period (beginning of a note to the beginning of the next note); 

(5) Call duration (beginning of the first note to the end of the last note of a call); 

(6) Note repetition rate; 

(7) Amplitude modulation; 

(8) Number of pulses per note; 

(9) Pulse repetition rate (measured within notes); 

(10) Call repetition rate (number of calls per minute); 

(11) Duration of the silent interval between calls (end of a call to the beginning of the next call); 


Call rate and silent intervals are often dependent on calling motivation, and can therefore be affected by 
disturbance. Their use is thus more limited, and these data can be considered optional. We however explain how to 
determine these variables in Raven. You should not use these data for taxonomic conclusions if there is any doubt 
about the male’s calling motivation (e.g., if the individual was obviously disturbed by the presence of the 
investigator). 


Measure the temporal variables as follows: 

(i) Start by making an Excel table for your measurements as shown in Table SI (you will export all your data at the end 
of the analysis). 

(ii) Go in Side Panels Views, and uncheck “Spectrogram 1” (Fig. S6). You do not need the spectrogram view at this 
stage and can check it again later when necessary. 

(iii) Grab the lower border of the Raven Desktop View, which hides the Selection Table, and move it upwards (Fig. S6). 
You should now see an additional view called “Table 1” (Fig. S6). This view contains seven columns, four 
including default measurements, and can be used to store data until you export them as a/in your Excel table. The 
number of measurements displayed can be expanded/reduced by going in Menu Bar^View^Choose 
Measurements. A new window opens with Displayed Measurements on the left side, and Available Measurements 
on the right side. 

(iv) In Available Measurements select “Delta Time” and click on «to add it to your Displayed Measurements. 

(v) Select “Max Frequency” and “Peak Amplitude” as well since you will need them later. Click each time on « to 
add each measurement to your Displayed Measurements. Once finished, click OK. New columns with these 
measurements are now added in your Selection Table. 

(vi) Go to Menu Bar^View^Choose Measurements. Select “Low Freq” and “Fligh Freq” in Displayed Measurements 
and click on »to place them in Available Measurements (these columns are not necessary). Click OK. 

(vii) Go to Menu Bar^View^Add Annotation Column, name it “Note #”. 

(viii) Go to Menu Bar^View^Add Annotation Column, name it “Internote interval #”. 

(ix) Go to Menu Bar^View^Add Annotation Column, name it “Note period #”. 

(x) Go to Menu Bar^View^Add Annotation Column, name it “Call #”. 

(xi) Go to Menu Bar—>View^Add Annotation Column, name it “Silent interval #”. You can reorganize the annotation 
columns at any time by going in Menu Bar^View^Reorder Annotation Columns. 

(xii) Identify where the call starts on the oscillogram. 

(xiii) Click in the oscillogram view to create a selection border at the start of the call (Fig. S7). You need to zoom in the 

sequence to better detect where the call starts. Most of the time the call (first note) starts with a rapid rise of 
amplitude and is therefore easy to detect. 
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Table SI. Example of an Excel table in which you can import the data acquired in Raven. Note RR = note repetition rate. Pulse 
R = pulse rate. 
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Figure S6. Selection Table with default columns, see text for explanation. 
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Figure S7. Duration of call is determined by using the Selection borders, see text for explanation. 


(xiv) Select the first call from the beginning of its first note to the end of its last note by moving the selection border to 
the right. You can still modify your selection afterwards by playing with the selection borders. To do so, grab them 
from the horizontal position marker (in pink in the view). 

(xv) Estimating the end of the call/note is not always obvious, especially in case of background noise. Playing with the 
selection and zooming in the structure of the waveform helps to determine where the call/note stops, in some calls, 
several low-energy pulses occur at the end of the note, making the decision somewhat arbitrary. 

(xvi) If you have trouble estimating the beginning and the end of the call due to significant background noise (Fig. S8), 
refer to points # xvii-xxii below. If not, go to point # xxiii. However, if background noise is too intense it is better 
not using the sequence for measuring temporal variables; the same applies when multiple calls overlap 
during a chorus. See box comment above on adapting measurement precision to recording quality. 

(xvii) Using the selection borders, select a short segment (e.g., 0.01 s) of background noise shortly before the note starts 
(Figs. S8-S9). 

(xviii) Check the amplitude peak of that segment in Table 1 (in the column “Peak Amp (u)”. Fig. S9). 
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Figure S8. Example of a call with significant background noise (Rhinella martyi calling close to a fast-flowing stream). The 
end of the call is hard to detect due to the noise produced by flowing water. Lower oscillogram view shows the location of the 
beginning of the call (after zooming 5 times along the x-axis and 2 times along the y-axis). 

(xix) Use that amplitude peak as a threshold to discriminate between the note and the background noise. To do so, zoom 
in at the end of the note (Fig. S9). 

(xx) Using the selection borders, select a short segment of the same length as before the call (here 0.01 s) and check 
where the peak amplitude is equal to or below your threshold (in the “Peak Amp” column. Fig. S9). 

(xxi) Once you reach amplitude equal to or lower than your threshold you can consider this as the end of the note (call) 
(Fig. S9). 

(xxii) Select the call between the two 0.01 s-segments (Fig. SIO). 
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Figure S9. How to discriminate between call and background noise, see text for explanation. 


(xxiii) While selecting the call, you can see some data appearing in “Table 1”. A square in dashed line appears in front of 
the number of the selection (first column) and your selection appears dashed-lined in the oscillogram view as well, 
(xxiv) Once your first call is precisely selected between selection borders (Fig. SIO), click Enter. 
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Figure SIO. Final selection of the correct call duration, see text for explanation. 


(xxv) A window named “Annotate Selection 1” appears. Go to “Call #”, type 1, and click OK. By doing this you specify 
that the selection you made corresponds to the first call of your sequence. 

(xxvi) As soon as you create a new selection hy clicking in the view the previous selection becomes solid-lined and 
appears in light blue in the oscillogram view. That selection is deactivated (you cannot modify it), to reactivate the 
selection click on it in the Selection Table. The corresponding “Delta Time” in Table 1 is the duration of the first 
call (/.£., the difference between starting and ending time). The “Begin time” and the “End Time” correspond to 
where the call starts and stops within the recording sequence, respectively. 

(xxvii) Now select the first note of the call by applying the same procedure as explained above for the first call, but when 
the window “Annotate Selection” appears, type 1 in Note # and 1 in Call #. By doing this you specify fhat fhis new 
selecfion is fhe firsf nofe of the first call in your sequence. The “Delta Time” of that selection is the duration of the 
first note. 

(xxviii)Using the same procedure, calculate the duration of the internote interval by selecting the time between the end of 
the note and the beginning of the next note. When the window “Annotate Selection” appears, type 1 in Internote 
interval # and 1 in Call #. By doing this you specify that this selection is the first internote interval of the first call 
in your sequence. The “Delta Time” of that selection is the duration of the first internote interval. 

(xxix) Apply the same procedure for the note repetition rate starting from the beginning of the first note to the beginning 
of the second note. When the window “Annotate Selection” appears, type 1 in Note period # and 1 in Call #. By 
doing this you specify fhat this selection is the first note period of the first call in your sequence. The “Delta Time” 
of that selection is the duration of the first note period. 

(xxx) Apply the same procedure to all notes/internote intervals/note repetition of the first call, naming them accordingly, 
(xxxi) To determine the silent intervals, start a new selection from the end the first call to the beginning of the next call. 
This is the first silent interval of your sequence. 

(xxxii) Then apply the same procedure to the next call, and so on until you analyzed all the calls of your sequence. 

(xxxiii) Determine (count) the number of notes per call. Report the information in your Excel table. 

(xxxiv) Measure the time between the beginning of the first note and the beginning of the last note using the technique 
explained above. Divide the number of notes included within this period by the time in seconds. This is the note 
repetition rate (z.e., the rate at which notes are produced) expressed in notes/s. Perform the same for all the calls of 
your sequence. Report that measurement in your Excel table. 

(xxxv) Check for the presence/absence of amplitude modulation by zooming in the notes. Describe the modulation. 
(xxxvi)If the amplitude is obviously and distinctly modulated, the note is said to be either pulsatile or pulsed (see 
definitions above). If appropriate, determine the number of pulses per note. Perform the same for all the notes of 
the call. Report these measurements in your Excel table. In case of strongly pulsed notes, pulse duration should be 
calculated as well, following the same procedure as for calls/notes. 

(xxxvii)lf the note is pulsed, determine the pulse repetition rate using the same technique as in point # xxxiv, but within the 
note. Perform the same for all the notes of the call. Report these measurements in your Excel table. 
(xxxviii)Finally, determine the call rate by calculating the number of calls produced per minute. 
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B.3. Spectral structure 


Spectral variables can be measured using a frequency analysis tool from the spectrogram in Raven, but should not be manually 
assessed from the spectrogram visualization itself A Power Spectrum should always be consulted to assess the distribution of 
frequencies in cases were automated measuring tools cannot be fully trusted, especially in poor-quality recordings with lots of 
background noise. The protocol in the following explains the use of the automated tool only, assuming the existence of high- 
quality recordings. The description of spectral properties should include: 

(1) Presence/absence, and number of visible harmonics (note that the number of visible harmonics can depend on 
spectrogram settings, performance of your microphone, and quality of recording); 

(2) Dominant frequency; 

(3) Frequency modulation and shape of the dominant frequency; 

(4) Bandwidth (preferably 90%- or 95%-bandwidth; see section B4 for other options) 


Optional: 

(5) Fundamental frequency. Report if it is not recognizable (see explanation earlier in the text); 

(6) Frequency of highest harmonic; 

(7) Frequency of each visible harmonic. 


Measure the spectral variables in calls with visible harmonics as follows: 

(i) Go in Side Panel^Views, and check “Spectrogram 1”. Uncheck “Waveform 1” (you can check it again later 
whenever necessary). 

(ii) Zoom enough in the sequence to be able to distinguish the harmonics within the note. Adjust Brightness and 
Contrast (in Tool Bars) to be sure that you see all the harmonics (see remarks above). Modifying the Color Scheme 
helps; to do so, go to Menu Bar^View—>Color Scheme^Standard Gamma 11 (or any other color scheme you 
prefer). 

(iii) Count the number of visible harmonics. Report that number in your Excel table. 

(iv) Check the shape of the harmonics. Is there any modulation? 

(v) Locate the harmonic in which the greatest amount of sound energy is concentrated (shown as the brightest 
harmonic in the spectrogram; use a color scheme!). This is the dominant frequency of the note. You can detect this 
better by slightly increasing the brightness. 

(vi) The dominant frequency of each note is displayed in the column “Max Freq (FIz)” (Raven already calculated this 
based on the note selection you did before). 

(vii) Optional: to calculate the frequency of any other harmonic than the overall dominant frequency, simply select the 
harmonic alone in the spectrogram. The frequency is displayed in the column “Max Freq (FIz)” (Fig. Sll). 
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Figure Sll. How to calculate the maximum frequency of each harmonic, see text for explanation. 
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When describing harmonics (and especially when comparing among specimens, populations or species), consider 
the possibility of false harmonics, the dependence of their visualization on FFT bandwidth settings, and the 
dependence of visibility of high-frequency harmonics on recording distance and sometimes recording equipment. 
Also, in many recordings (especially in low-frequency calls) the fundamental frequency might be impossible to 
determine reliably, because in particular in the low frequency range (0-2 kHz) a lot of environmental background 
noise occurs. In cases where fundamental frequency cannot be reliably determined, it is better to report only 
dominant frequency and approximate prevalent bandwidth. 


B.4. Alternative method for measuring temporal and spectral parameters 

In many instances, especially when analyzing sounds recorded in nature, it is hard to determine precisely some parameters, 
mainly due to background noise or differences in brightness and contrast configurations. Possible solutions have been proposed 
before (see Figs. S9-S10). Alternatively, it is possible to rely on automatic sound analyses in Raven. 

In the same way that you can automatically acquire the dominant frequency (achieved with the Max Freq or Peak Freq 
tools), you can also access the maximum and minimum or fundamental frequencies, excluding the lower and higher portions of 
the call that concentrate 5% of energy each. That is obtained with the measurement tools Frequency 95% and Frequency 5% 
respectively (Fig. S12). 

Besides spectral traits, duration of the call/note could be accessed similarly; i.e., using the measurement tool Time 5%> and 
Time 95% you are able to access the initial and final times of a call/note (Fig. S12). It will also exclude 10% of the duration of 
the call/note, limiting the portion with 90% of energy in that call/note. 


• • • 

Rav«n PiD 1.4 


S ■ ■ fi a 0 la SI 0 9 t7'l0 9 & 3 B 

► I ►- ►.'»!) 

I ! 4 ■ # Rate 1.0 


ItEllHilUiJ + <&©"|''OKK9.a.^«! BBUDOB SeXO'*'* B BIM'D'HI-'SIS* 

SOQ SO|£ soo 



Maximum frequency 
(manual) 

Dominant Frequency 

(automatic using either 
Max Freq or Peak Freq 
measurement toois) 

Minimum frequency 
(manual) 


Final duration at 95% of energy 

(automatic using Time 95% measurement tooi) 


Maximum frequency at 95% of energy 

(automatic using Freq 95% measurement tool) 


Minimum frequency at 5% of energy 

(automatic using Freq 5% measurement tool) 


Initial duration at 5% of energy 
(automatic using Time 5% measurement tool) 


Figure S12. Spectrogram of the advertisement call of Ischnocnema juipoca (Brachycephalidae) indicating automatic 
measurement tools that could be used to reduce biases of identifying the spectral and temporal boundaries of the calls. The 
range between maximum frequency (manual) and minimum frequency (manual) is named the bandwidth and is measured with 
careful use of spectral analysis tools in high-quality recordings. In low quality recordings it can be roughly estimated from the 
spectrogram and should then be referred to as approximate prevalent bandwidth. The range between the minimum and 
maximum frequencies at 95% of energy is the 95% bandwidth. See text for explanation. 

Additionally, you can use the measurement tools Bandwidth 90% and Duration 90% that simple subtracts the Frequency 
95%> by Frequency 5% or Time 95% by Time 5%, respectively, to obtain frequency bandwidth and delta time where 90% of the 
energy of the call/note concentrates. Excluding 10% of the call frequency and duration reduces subjectivity as boundary limits 
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are influenced by brightness and contrast configurations. Those values are independent on that, only being affected by FFT 
adjustments. The same rationale could be applied to count harmonics. The number of harmonics could be counted only within 
this 90% frequency bandwidth, excluding those hard to measure and probably meaningless to anuran communication (and 
sometimes those are even artifacts of recording, digitizing, or analyzing settings). If bandwidth cannot be objectively measured 
due to high background noise or other problems with the recordings, assess the approximate prevalent bandwidth from the 
power spectrum or spectrogram, and report it as such (see main text for definition). 


C. Writing and illustrating the description 

At this stage all the data are compiled and almost ready to be used for the description. Export your selection table by going to 
Menu Bar^File^Save Selection Table “Table 1” As... If a message about missing frequencies pops up, click “Yes”. Select the 
folder where you want the file to be saved on your computer. Click OK. 

Import the data in your Excel table (Table 1). Calculate mean and standard deviation for all the data. Refer to Checklist IV 
for additional recommendations. 

As explained earlier in the text (see Checklist IV), illustrating the call properly is an important step in the call description. 
You should at least provide an illustration of the oscillogram and spectrogram of: 

(1) A few consecutive calls; 

(2) One call; 

(3) One note; 

(4) We suggest providing a Spectrogram Slice View (the Power Spectrum) as well (optional). 

Raven allows you to export images, but before exporting these images some clean up is often necessary (although not 
mandatory). Some filtering can be done in Raven, and some additional clean up may also be done using any image-editing 
software. Keep in mind that what you want here is to propose the best graphic representation of the call. Therefore, filtering 
affecting some parts of the call is acceptable (unlike in the analysis of temporal and spectral variables). 

Produce illustrations as follows: 

(i) Select a few consecutive calls that have the least background noise as possible and are well representative of the 
species call (we suggest three calls, but this of course depends on the call type you analyze). 

(ii) Zoom enough in the sequence to have the entirety of these calls in the view. 

(iii) Using the selection borders, select the three calls and copy them (Menu Bar^Edit Copy). Make sure you select 
and copy a segment corresponding to a "rounded" time frame such as 0.5 s, 1 s, 2 s or similar (in this example, 30 s; 
Fig. S13). If not possible, you can perform additional clipping or extend the duration of the segment after pasting 
the copied sound. 

(iv) Go to Menu Bar^File—>New^Sound Window. 

(v) Paste the sequence in the new window (Menu Bar—>Edit^Paste). 

(vi) Go to Menu Bar^View—>Color Scheme—>Standard Gamma 11 (or any other color scheme you prefer). 

(vii) Select the Waveform View and zoom along the y-axis to have the waveform high enough to fit the whole view (Fig. 
S13). 

(viii) Select the Spectrogram View and adjust Brightness and Contrast (in Tool Bars) to make sure that all frequencies are 
well visible (Fig. S13). 

(vix) Using the Selection Borders, select the part of the sequence below the fundamental harmonic (in the Spectrogram 
view. Fig. S13). 

(x) Go to Menu Bar^Edit^Band Filter^Out Active Selection (Fig. S13). 

(xi) Zoom along the x-axis and remove unnecessary noise before the first note by selecting it using the selection 
borders, then go to Menu Bar^Edit—>Band Filter—>Out Active Selection (Fig. S14). 

(xii) Remove all unnecessary noise by performing the same for the three calls. 

(xiii) Zoom out to see the entirety of the three calls in the window. 
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Figure S13. How to clean up the oscillogram for illustration. Filtering below fundamental freaquency, see text for explanation. 
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Figure S14. How to clean up the oscillogram for illustration. Filtering before the first note, see text for explanation. 


(xiv) Go in Side Panel^Components, check “Axes” and “Axis Title”, uncheek all other components (Fig. S14). 

(xv) Go to Menu Bar^File—^Export Image Of-^All Views of Window “name of your file". 

(xvi) A window named “Save Image As...” opens. Give a name to your image, select TIFF files as file format, and 
seleet the folder where you want the image file to be saved on your eomputer. Click OK. You now have an image 
with the representation of three calls (Fig. S15). 
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Figure S15. Illustration of three calls of Allobates amissibilis, as exported with Raven. Note that only the first call has been 
filtered/cleaned up in this example. 

(xvii) Check the unchecked components again in “Components”. 

(xviii) Zoom in the sequence until you have the entirety of one call in the view. 

(xix) Perform the same procedure as above (select the call, import it in a new window and apply some filtering), you 
should get something as in Figure SI6. 

(xx) Zoom in the sequence until you have the entirety of one note in the view. 

(xxi) Perform the same procedure as above (select the note, import it in a new window and apply some filtering), but a 
few next steps can be performed after cleaning up and saving the image of the note (optional). 

(xxii) Grab the vertical Position Marker (in pink in the view), and place it in the waveform where the amplitude is 
maximal (Fig. S17). 

(xxiii) Go to Menu Bar—>View^New^Spectrogram Slice View (or use the corresponding icon in the Raven Tool Bars, 
Fig. S17). 

(xxiv) A window named “Configure New Spectrogram Slice View” opens. 

(xxv) In “Type” select “Blackman”, in “3 dB Filter Bandwidth” type 150 Hz. 

While filtering is a recommendable procedure to produce high-quality spectrograms and oscillograms, its use 
should be restricted to cases where it can be unambiguously decided which sounds are part of the calls and which 
are background noise (e.g., environmental noise, sounds produced by other animals or by conspecific males). 
When using illustrations based on filtered/cleaned sounds, it is even more important to deposit the unfiltered 
original recording in a publically accessible sound repository, to allow other researchers confirmaton of your 
findings if necessary. 


120 ■ Zootaxa A25\ {\) © 2017 Magnolia Press 


KOHLER ETAL. 









































15- 


at 

■D 


Q. 

E 

< 


Figure S16. 



10 - 

5- 

0 

-5 - 
- 10 - 
-15 


Time (s) 

Illustration of one call of Allobates amissibilis as exported with Raven, ready for publication. 
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Figure S17. How to add a spectrogram slice view to your figure, see text for explanation. Red circle highlights the “New 
Spectrogram Slice View” tab in the Tool Bars. 

(xxvi) Leave the other values as default and click OK. Note that you can produce different spectrogram slice views trying 
different values in order to get the best result {i.e., nice peak for each harmonic). 

(xxvii) Go to Menu Bar—>File^Export Image Of—>A11 Views of Window “name of your file'' and apply the same 
procedure as explained before. 

(xxviii) If you wish to perform more clean up on your image you can open it in the image-editing software of your choice. 
Final result should resemble to Figure SI8. 
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Figure S18. Illustration of one note of the call of Allobates amissibilis as exported with Raven, ready for publication. 

You may additionally provide visual comparisons with calls of similar species, in that case simply follow the same procedures 
as explained above. 


For comparative purposes (comparing calls of different specimens, populations, or species), make sure that you 
select for all spectrograms /oscillograms exactly the same time resolution, and the same FFT bandwidth. 


APPENDIX IB. A practical guide to sound processing and spectrogram illustration with R 

We will assume that users have already a digital sound file containing the calls and stored in .wav audio file format. The first 
necessary step is to edit this sound to remove all unnecessary sections or extraneous sounds, and to select a section of sound of 
the desired duration. As stated before it is important to select the same sound duration in all calls that are shown for 
comparative purposes. Using Raven (or other sound edition software), select a recording section, for example of exactly 5000 
ms (5 s), 1000 ms (1 s), or 500 ms. Then cut the section, paste it into a new file, and save this new file under a new name. In 
general, there is no need for illustrating stereo recordings and these can be converted to mono by selecting one of the two 
original stereo channels. 

Always make sure that the sound you want to analyze is not oversaturated (I'.e., the maximum amplitude peaks in the 
oscillogram must not reach the scale limits). 

If necessary, this selected sound section can be further processed. In general, in order to maintain the original 
characteristics of the sound, filtering should be avoided or used with extreme care (see Raven SOP above). If due to particular 
recording conditions a spectral filtering is absolutely necessary the spectral characteristics of the focal call should be considered 
when applying the filter and the characteristics of the filter should be published. 

Although all this preprocessing of sounds could potentially be performed in the R environment, it is best done with audio 
editing software. It can be easily done with Raven (see SOP above) or with software such as Audacity® (free) or Cool Edit Pro 
(now rebranded as Adobe® Audition®). These programs allow for easy zooming in and out of the temporal domain of the 
sound and can perform good quality editions like selecting, copying, and filtering (Fig. S19). 

Once a sound fragment containing the focal call (or calls) has been saved as a .wav file, it can be processed in R. 

R is an open-source statistical programming environment that can be extended into an increasing number of applications 
using modules or packages. One of its main strengths is the graphic outputs allowing for the production of well-designed 
publication-quality plots (R-Development-Core-Team 2014). This brief manual will guide you through the necessary steps to 
produce quality spectrogram - oscillograms composite plots to illustrate bioacoustical studies. The software and parameters are 
identical to those used to produce the illustrations presented in this paper. 

R can be freely downloaded from internet (https://cran.r-project.org/) and the installation procedure is relatively 
straightforward. However, R is distributed as a basic version (R console), which can be customized and extended using 
packages and scripts. The packages can be downloaded and installed from the packages menu in the R console. Packages are 
copied in different repositories and the user must firstly select one repository (or mirror). Once a repository is selected the 
necessary packages can be installed from the packages menu, which will display a list of available packages. Simply scroll- 
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down the list and pick the desired package. In order to set up the sound processing capabilities in R, the users will have to 
install two packages: Seewave (Sueur et al. 2008a) and tuneR (Ligges el al. 2014). These packages contain numerous functions 
for sound edition, synthesis, and measurement and their proper use requires some basic knowledge about R commands and 
computer language that are beyond the scope of this guide. We will concentrate only in the production of spectrogram- 
oscillogram composites for which we have designed a script with four easy steps. 





Time (s) 


Time (s) 


Figure S19. Sound editing and fdtering with Audacity® software (A-C), and production of spectrograms and oscillograms 
with Seewave (D-E). A) Sound file of a recording of advertisement calls of Eleutherodactylus leberi from eastern Cuba under 
high ambient noise, the desired sound section has been selected for copying into a new file. B) Oscillogram of the selected Is 
second section containing one advertisement call, the power spectrum shows a noise band from crickets with a peak around 5 
kHz. C) The same sound after applying a low-pass filter (cut-off frequency = 3.5 kHz, filter steepness = 36 dB/octave), note 
that the amplitude of all sounds above 3.5 kHz have been greatly reduced. These sounds were later processed with the 
Spectrogram-Oscillogram R script to produce the corresponding spectrogram-oscillogram composites shown in D) and E). 
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The entire script is reproduced at the end of the section. The following is a stepwise protocol for performing the commands: 

1. Activate the packages: 
library(seewave) 
library(tuneR) 

2. Set the working directory (substitute the text in quotation marks "" with the path to a pre-existing directory in your 
computer). It is useful if the input sound file is copied into this same directory. 
setwd("Path/to/your/working/directory") 

[alternatively, use the command below as also implemented in our script which will display a choice menu] 
setwd(choose.dir()) 

3. Import your sound file into a new R object using tuneR package (a choice menu will be displayed, and the input sound file 
can be selected). 

infile <- file.chooseO 
sound<-readWave( infile) 

4. Plot oscillogram and spectrogram in two panels together and export figure 

png(filename=paste(as.character(infile),"512FFT", "png", sep = width= 1200, height=1000, res=200) 

spectro( sound, flim=c(0,10), wl=512, ovlp=90, osc=T, heights=c(2,2)) 

dev.offO 

The last step of the script performs a spectrogram in a frequency range between 0-10 kHz using FFT size of 512 points, 
Hanning window, 90% of overlap and plotting an oscillogram of the waveform. The optional parameter “heights” controls the 
relative width of the two graphic panels. By default this function normalizes the amplitude of input file (uniformly increasing 
the values so that the peak power is the maximum amplitude), which is a useful feature in this case. In this example, the output 
is directed to a .png image file named as the input sound. 


########################################################################## 


# Automatic generation of spectrogram - oscillogram composites # 

# This script takes one " .wav" file in a folder of choice # 

# and produce a "<file name>.png" image including the spectrogram # 

# and oscillogram with the selected settings. # 

# Standard settings are provided but can be modified at will # 


########################################################################## 

# 1. Activate the packages: 

library(seewave) 

library(tuneR) 

#2. Set the working directory (a choice menu will be displayed) 
setwd(choose.dir(j) 

#3. Import your sound file into a new R object using tuneR package (a choice menu will be displayed), 
infile <- file.chooseO 
sound<-readWave( infile) 

#Plot oscillogram and spectrogram in two panels together and export figure 

png(filename=paste(as.character(infile),"512FFT", "png", sep = "), width= 1200, height=1000, res=200) 

spectro(sound,flim=c(0,J0), wl=512, ovlp=90, osc=T, heights=c(2,2)) 

dev.offO 

########################################################################## 
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